Continue in 2 seconds

The Next-Generation Data Tier: A Repository for Integration Assets

  • December 01 2004, 1:00am EST
Data tiers hold the promise of vastly simplifying application maintenance and reducing the number of redundant data stores in your organization.

The creation of an enterprise data tier is becoming a practical reality with today's standard architectures and interfaces. Creating a unified layer where applications can interact with the data they need has the promise to vastly simplify application maintenance and reduce the number of redundant data stores spread around the organization. To achieve this goal in an orderly and incremental fashion, we need to make some modest changes in the way we approach integration projects. Our data integration work today is creating precious new information assets for the organization, not just checking off a task in an application project. We need to create reusable definitions of integrated data, reusable integration logic and interfaces that match application requirements. As we complete each project, we can add these assets incrementally to the data tier and transform the way our companies develop and maintain applications. But to make it work, and to avoid becoming yet another silo, the data tier must be able to scale across the enterprise for potentially many applications and data sources. Enterprise information integration (EII) solutions can help us get there ­ if they are flexible and scalable enough.

Antidote to Integration Silos

With the complexity of data required by today's applications, it is becoming more difficult to justify carrying out data integration projects independently within one department or location, or to create a de facto silo based on one group's choice of a special-purpose technology. It's not the integration technology that's at issue here so much as the need to preserve and build on the integration work we do. When projects operate in a disjointed way, it is nearly impossible to reuse integration work, whether that work is manifested in a data mart or in application code. Enterprise productivity suffers as the same work is frequently redone in another part of the organization, and no one can build on the abstractions created within the project.

The problem is not new, nor is the typical suggested solution - create a single model of all data in the enterprise. However, the latter has not proven practical for many organizations. We'd prefer instead to create a flexible universal data access and integration architecture that can integrate both with existing data sources and with applications that need the data. This universal architecture - a "data tier" - can serve as one virtual data source (with logical and physical components) that provides applications with the data they need. Such a data tier is the best repository for those much-desired canonical views of data such as customer or order, which, while they might ultimately integrate data from many different physical sources, can present one view to the application. By placing these integrated views in a shared data tier, we make them visible and reusable across potentially many applications so each application has data produced in the same way. In short, we achieve our much-desired single view of the truth.

While this kind of enterprise data tier might sound like an impossible dream, it's within our reach today. Rational data integration within the scope of the project and reuse of that integration work does not require the entire enterprise to be integrated. There's no reason, for example, to integrate all the data into one enterprise model or data warehouse, or standardize on one integration approach. Companies will continue to take advantage of many different tools and approaches for transforming and integrating data, including warehouses, marts, EAI, ETL, XML, Web services, data cleansing tools and the like, along with flexible programming languages such as Java. The data tier needs to be integration technology-neutral - flexible enough to plug in multiple technologies so organizations can leverage all of them to apply their unique benefits to the projects for which they are appropriate. Companies will also continue to take advantage of data in many different forms including relational databases, structured data files such as CSV or XML, application data accessible through APIs or a Web service interface, URL-based data sources and others. The data tier must be able to bring in any data from anywhere and serve it up to applications.

A Key Resource for the Integration Project

While the data tier does not exist to serve any one project exclusively, it can be developed one project at a time. For each project, we can ask: What data is needed? Who will have access? And how should they have access? Then we can design those transformed and integrated views of data, making them available not only to our application, but to others that might need the same information.

The process of creating integrated views of data must be as easy as possible and must be an incremental process. To know how to map source data to our desired target result, we simply need access to the meta data for those sources, and we need to be able to, in turn, express the meta data for the result in ways that others can see and use. This is the most critical role of model-based design in the integration project, a role that is often forgotten. While it might be nice to know how all our enterprise data fits together, it's much more critical to know exactly which canonical view of customer is needed and how it should be (or has been) created from the existing data sources in different locations. This question of data lineage is becoming much more prominent as integration projects become more complex. With an enterprise data tier, the lineage of a given view of data must be immediately apparent.

From the perspective of the individual application, the data tier is an application tier containing business rules governing one or more views of data. The code's task should be as simple as accessing the views by name. Why concern the developer with the specific location or technology employed by back-end data sources, specific security or drivers? Leave that to the data tier. The developer's work becomes finding the right data views that have already been created or designing the right integrated views if needed, and deciding how those views should be made available.

Many Access Methods, One View of the Truth

In any application project, the question of exactly how an application will need to work with the data drives the way the data is made available. This design choice has received little attention, but becomes important as we consider how to formulate our data tier. We need to know not only in which format the data must be delivered, but also through which interface data should be exposed to the application. Some applications (and users) must be able to issue arbitrary SQL against real or virtual relational tables; therefore, data must be made available as database tables for those applications. Other applications just need a result. Some expect to interact with XML files, and still others expect Web services. One logical model of the data should be deployable in multiple ways to suit application requirements.

For some production data sources (and their stewards, the DBAs), the idea of ad hoc SQL access is anathema. Any queries that will be issued against their databases must be vetted, optimized and tightly regulated. For these data sources, there's no substitute for controlled procedural access to the data - more of a service (or procedure) view than a relational view. The application makes a call, and the service executes. Because many opportunities for optimization exist, performance may be vastly better with service views than with ad hoc relational access. The needs of data owner, application and end user are served as efficiently as possible. Therefore, in situations where SQL access is not an absolute requirement, why create tables?

For other situations, there's no substitute for SQL access to tables. After all, many applications only speak SQL, and they need tables to talk to. Certainly when you are prototyping a new application, it is helpful to have the flexibility of defining your queries on the fly. When you are ready to make the application perform in production, however, a service might be the way to go. The point is that developers and integrators need to be able to mix and match these different types of data access to meet a variety of business requirements, application requirements and performance requirements. Some users may have a legitimate need to issue ad hoc queries against a certain table, while others just want to issue one specific query that can be predefined, and still others just need a weekly snapshot report. These aren't different data requirements at the logical level - they are different access requirements. If we understand those requirements and have the flexibility to address them individually, we can optimize delivery of data throughout the organization.

A modular data tier makes it possible to provide data views through one interface or through many different interfaces if appropriate, while simplifying and streamlining application code. The set of views can grow incrementally, project by project; the resulting integration assets have the potential to significantly reduce the amount of integration work needed down the road and increase the visibility of enterprise data assets and the ways they are used.

Will it Scale?

One very important requirement of the data tier is that it eventually be able to scale across a large enterprise. This means handling multiple divisions or locations of a company that may have different technology infrastructures, operating across firewalls, supporting large data sets, and being able to grow incrementally to handle more users and more traffic. A data tier that does not scale will remain an application silo, a curiosity that may once have helped some developers get a project done more quickly or more cost-effectively, but means nothing to those who come after. We can no longer afford such silos.

Flexibility is Key

Many organizations have seen the promise of the enterprise data tier and are working toward a solution. They recognize that they need more than just the latest integration tool to create the data tier. Today we are frequently limited by the too-specialized nature of a one-trick integration technology. With a hammer in hand, everything can look like a nail. The reality is each integration technology has its pros and cons. With a hammer, wrench and three types of screwdrivers all available to our integration projects, we can match the tool to the problem we're trying to solve.

Will EII Help Us Get There?

Today's EII solutions attempt to solve a big problem for organizations: runaway cost and complexity. The application complexity problem was solved by reducing custom development, buying more off-the-shelf software and outsourcing to lower-cost suppliers. However, the data complexity and integration complexity problems won't go away as easily. We need to get more value from the work we do with our data. Today, integration decisions and assets are either 1) materialized in redundant data stores that must be built, hosted, managed and updated, or 2) buried in application-specific code, resulting in excessive costs in development and maintenance. In either case, the costs are too high, and the trend toward more complex integration creates a need for something new. EII solutions take an alternative approach, proposing to serve data in a more dynamic fashion while enabling creation of the integrated data views needed by applications.

Because they create a layer of abstraction, EII solutions promise to help companies create their enterprise data tier without having to invent everything from the ground up. For this reason and others, more organizations are looking closely at EII solutions. In deciding which EII solution is right for you, consider the various options available in the context of your goals and application requirements, and the critical projects on your horizon. If the idea of an enterprise data tier sounds attractive, make sure you know whether the solution you choose gives you a flexible and scalable toolkit or just another hammer for your project silo.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access