It may be time for us to reconsider some of our previous assumptions regarding data integration. For decades, we’ve presumed that the best method for managing data was through strict conformance and control. We viewed the enterprise as static, remaining stable once properly defined. We’ve discovered that quite the opposite is true. Today we are facing more complexity in data integration than ever before - more data sources, greater volumes of data, more solution paradigms to deal with and greater expectations for cross-domain data exchange. Moreover, data integration has become the lynchpin within holistic architectures based upon services and sophisticated business process orchestration. Agile data architecture is not a vendor-focused solution; it is a technology-agnostic philosophy designed to address the new realities of enterprise integration. It provides a context and methodology that aligns technical solutions with the expectations that drive them. Agile data architecture provides a solution focus that pulls the big picture together without losing sight of those whom it serves, the end users.


Sometimes it is necessary to step back and reassess just what is really going on to discern the big picture. Few topics are more complex than enterprise integration. We tend to address it by breaking it up into subcategories and areas of specialization or best practices. Data architecture represents one of three primary architecture categories (data, application and process) and contains within it perhaps a dozen or so areas of specialization (warehouses, master data management [MDM], operational data stores [ODS], business intelligence [BI] and so forth). Those areas of specialization tend to be closely aligned with software solutions, but some also represent communities of practices dedicated to specific integration techniques. Each of these specializations arose to address specific business issues or integration challenges. However, what seems to be lacking is a unifying mechanism that ties all of these elements together within a relevant context. In other words, what is missing includes both the true purpose for integration and the means by which we might achieve it. Without that context, how can we hope to solve even more complex situations, such as (SOA) transformation and cross-domain data integration?


In many ways, integration is a philosophical rather than technical problem. All good philosophies begin with one central premise, a premise that is both meaningful enough to derive further elaboration and flexible enough to cover a wide spectrum of related effort. The philosophy posited in this article is dedicated to data architecture because data is the lifeblood that links all other elements of all other potential architectures together. Agile data architecture is not just philosophy, though; it also represents the beginnings of both a data and enterprise integration methodology designed to compliment rather than to combat complexity.


A Central Premise


While our central premise seems obvious upon first glance, in many ways, it isn’t. The premise is this – users are the reason why we build solutions, and users are our best resource for determining how to build them. Our focus on the user determines or ought to determine all aspects of solution design, even those which we do not generally consider to be the domain of the end user. Without a user-centric focus, technical decisions become arbitrary, and ultimately, the solution divorces itself from the reality of its conception. User-centricity is the primary motivating force behind the development of all agile solutions. The user provides:

  • Immediacy – the desire for near-term real-world capability.
  • Relevance and context.
  • Performance expectations.
  • Direction, domain knowledge and the logic behind every solution.

End users are the great democratizing force behind solution integration and development. Universal adoption is not possible without them – while we developers may be great builders and architects, sometimes what the people need are efficient and utilitarian structures rather than monuments to our own brilliance.


What is Agile Data Architecture?


Agile data architecture builds upon the core premise and represents the combination of a dynamic set of interrelated best practices rather than a standardized or single architectural approach. More importantly, it characterizes the methodology which allows that set of interrelated best practices to be coordinated and also provides us the means by which to measure our success with the resulting solution. Much of the strength behind this approach is based upon the ability of this philosophy to accommodate evolving technologies and architectural best practices. Today’s recommended solution will change, and knowing it will change will drive decisions that impact performance, cost and schedules.


The best way to illustrate the concept of agile data architecture, though, is by example. If one were to take a survey of the best practices that together constituted a holistic data architecture solution, the following examples might be chosen to illustrate the concept:

  • Actionable enterprise architecture (AEA) - the foundation – architectural context perspective).
  • Federated data orchestration - data source perspective.
  • Enterprise master data management (eMDM) - data optimization perspective.
  • Agile business intelligence (ABI) - data exploitation perspective.

We will explore each of these example best practices in turn.


Actionable enterprise architecture (the foundation – architectural context perspective). Every project ought to start with the recognition of its context within whatever environment it resides. AEA provides that context and a tangible way to connect architecture layers (EA, segment and implementation) and perspectives (application, process, data). One of the main reasons that large integration projects fail is due to the inability to successfully map the various architectures within a meaningful combined picture. Every agile data architecture begins here.


Figure 1: Agile Data Architecture – Foundation: Actionable Enterprise Architecture


Federated data orchestration (data source perspective).The best way to explain this is through analogy. Would anyone advocate that all of the resources represented by URLs on the Internet be combined into one massive data source? Would anyone recommend combining all fiction ever written into one gigantic novel? No one would suggest such ideas. We use indices to help locate materials on the Web rather than integrating them all, and even though much fiction is similar, we’d never suggest that all fiction should be condensed into one representative fictional model. Data federation recognizes a basic premise about data that has been overlooked by many contemporary data management solutions – that integration does not and never has necessarily implied standardization.


The true problem has always been the difficulty in effectively separating similar types of data in order to gain an accurate view of the domain being examined. Granted, one way to solve this is through data source consolidation; however, that approach is simply not scalable or manageable beyond a certain threshold. Orchestrated data federation represents a best practice dictated by common sense – the determination to allow data owners to collaboratively manage resources across domains based upon a shared set of rules rather than a shared single data model. This is an excellent example of a user-centric approach. One of the major shortcomings of massive data warehouse projects has been the lost connections between users and developers and resulting data integrity issues.


Figure 2: Federated Data Orchestration Facilitates Cross-Domain Integration


eMDM (data transformation and optimization perspective). What do your enterprise’s databases, applications and documents all have in common? Do they share a common data model? Not likely. But all of these resources do represent enterprise information and can be characterized and tracked using metadata. Some studies have shown that the time we spend looking for information resources gobbles up 25 to 30 percent of the average workday. Metadata is not just a technical consideration; it can define productivity in our knowledge economy.


MDM has arisen out of the pragmatic need to control metadata across disparate data sources and applications within a given enterprise. The problem thus far with MDM solutions is that they tend to be focused just within a single enterprise. The principles behind MDM, however, can be extended to accommodate semantic mediation between disparate enterprises or functional domains. Innovators are posturing to use MDM as the discovery fabric between services-oriented environments and the semantic governance layer within unique enterprise enclaves. This new role is referred to as eMDM. End users are the key to making this work because they are responsible for defining the semantic layer and the rules for discovery between domains.


Figure 3: Transformation and Optimization Perspective eMDM


In agile data architecture, the nature of data transformation takes on a new character, focusing on steps necessary to aid user-driven discovery and manipulation. Attempts to build complex logic and data integration at or near the data source level often represent developer assumptions about user need; usually without the benefit of full validation. Worse yet, using the old techniques, many of those assumptions end up being hard coded into solutions. The MDM best practice is a good illustration of how and why logic needs to migrate closer to the end user. The metadata we gather to help determine the types of data discovery that occurs most often is validated by users then stored here and used to make other decisions regarding performance optimization within the federated data layer. In many ways, data integration is moving to more of a search engine model – data heuristics rather than data modification.


Agile business intelligence (data exploitation perspective). The Internet has redefined people’s expectations about information. Access to information has become easier, faster and, most importantly, more flexible. The search engine discovery interface is the ultimate ad hoc reporting tool in many ways. Over the past year, companies have been scrambling to imitate that interface on the desktop and even combine the two. Why is it so popular, even considering that the results of current search engines are not particularly accurate? The answer is simple; the user is in total control of the discovery process. A new generation of BI capabilities is bringing that user control to more sophisticated report generation tools with much more accurate results. This is agile BI.


From the developer’s perspective, instead of making assumptions we use our relationship with the end users to let them drive and validate logic from their perspective. Agile BI is focused on utilizing ad hoc capabilities to drive discovery and determine optimization strategies. Agile BI provides the user toolset for influencing the rest of the agile data architecture. Based upon user queries and activities, we gather metadata, optimize caches and determine cross-domain mapping strategies. Some have referred to this capability as “operational BI,” however, that doesn’t capture the context of what’s really happening across the entire architecture. Agile BI is not merely another channel for managing BI; it is a new way to view the entire practice of BI. This best practice assumes that BI is of value to the whole enterprise, not merely 10 percent of it. This practice also assumes that for BI to work, the end user becomes a partner in the development process. Most importantly, this is where we gain most of our insights on the overall performance of our agile data architecture.


Figure 4: Agile Business Intelligence – Data Exploitation Perspective


Measuring Agile Data Architectures


Perhaps the most important aspect of any methodology is how it empowers an organization’s ability to measure success. Agile data architecture is no exception to this rule and provides two core mechanisms to help measure success (both during the design and development process and after deployment): the success dialectic and key performance indicators (KPIs).


Prototypical Key Performance Indicators


KPIs are only valid if they represent definitions agreed to or determined by the end-user community. If the development community is allowed to develop KPIs on their own, it is very likely that they will not reflect true user sentiment. The disconnect often becomes apparent when outsiders view a situation with healthy project KPIs but few true users of the solution. The following list captures several prototypical agile data architecture KPIs: 

  • Data accuracy (as defined by the user community);
  • Data accessibility (as defined by the user community);
  • Data discoverability (speed in access, accuracy in results, flexibility in determining discovery heuristics);
  • Flexibility and agility in data manipulation (precisely how long does it take for end users to implement and execute basic rules for data combination);
  • Solution availability (both in terms of capability iteration availability through development and solution availability after implementation).

How Does Agile Data Architecture Empower SOA?


SOA represents a design philosophy dedicated to promoting abstraction, modularity, reuse and discoverability. These points can all be summarized through the concept of loose coupling, or the idea that explicitly designed integration limits architecture flexibility. Sounds familiar, doesn’t it? Agile data architecture is a parallel and complementary design philosophy and methodology to SOA and agile application development. Both can be mapped together within the larger actionable enterprise architecture. The true integrating force between these philosophies is user involvement and the ability to apply a shared semantic layer of understanding across all architectural elements. While SOA has long considered universal description, discovery and integration (UDDI) as its primary discoverability mechanism, the reality is that nearly all integration with an SOA environment is based upon data exchange and will ultimately be demonstrated through data exploitation interfaces (agile BI). Once SOA architects fully realize the implications of this revelation, then agile data architecture will become the facilitating mechanism for cross-domain data fusion and enterprise integration.


We are poised at an important crossroads. The pace of change has reached nearly exponential proportions. We must immediately determine methods and techniques to manage and exploit what would have been considered unimaginable volumes of data just a few years ago. We have a choice in how we view that challenge – we can either try to control something that has demonstrated time and time again that it can’t be controlled or we can instead employ a set of dynamic rules and practices that acknowledges and embraces evolutionary progress. The pace of change also makes any project scheduled to be in development for longer than two years risky; the odds are that in two years the technology and expectations will have changed enough to make portions of the original design irrelevant. Data architecture needs to be rapid to remain relevant. While we may not be able to construct a rigid, perfectly ordered model of our enterprise that will stand the test of time, we can, however, build flexible solutions that can deliver value immediately and adapt to our changing needs. The choice seems clear.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access