Continue in 2 seconds

Model-Driven Data Integration

  • June 01 2004, 1:00am EDT

Systems hardware is getting faster and technologies such as ETL (extract, transform and load), EAI (enterprise application integration), message-oriented middleware and Web services offer many new data access options. Despite the advancements in the underlying technology, data integration is an increasingly daunting task. There are some obvious reasons that fuel this trend: data volumes are growing very rapidly, enterprise infrastructures comprise an increasing number of complex systems and the business value of integrated data for competitive advantage is on the rise.

The relationship between data volumes and the data integration challenge seems straightforward. There are more bytes and bits that need to be stored, read, manipulated and delivered to a target. Data volume impacts the fundamental integration concerns - task frequency, latency, physical location, storage, network and systems topology.

Yet it was only a decade ago when the standard PC came with an 80MB hard drive, and gigabyte-sized databases ran on huge million-dollar servers. Today, a high-end laptop will do the same. The challenge of managing a gigabyte-sized database 10 years ago is not all that different than managing a terabyte-sized database today. Therefore, it isn't really just about the data volume.

There is also a relationship between the complexity of the environment and the data integration challenge. When there are more systems that accumulate and share data, more work is needed to integrate them. Further, if the integration is through one-to-one couplings (as is the case with most handwritten scripts), the number of systems in the environment imposes an increase in geometric complexity. For example, if you have seven systems integrated to each other with one-to-one scripts, then the eighth system will add seven more one-to-one interfaces and the ninth system will add another eight.

Rapid data growth and growing infrastructure complexity make data integration increasingly difficult. However, what has really changed is the business need for integrated data. Only 10 years ago, when customers called vendors, they didn't really expect vendor representatives to have real-time access to purchase, upgrade or maintenance records. Sales executives didn't check whether or not customers had outstanding support tickets before making sales calls. Reports for the previous month were available on the 15th of the current month. Banks sent end-of-year statements in March of the next year, and that was fine. Not anymore.

The pervasive need for integrated data has changed the way that IT departments must handle data integration. It isn't just about moving bytes and bits across the wire anymore, and it isn't just the volume of the data - it is about getting integrated data that is relevant, accurate and easily accessible in a timely manner. Integrated data must be available as a service within the infrastructure.

When you plug your computer into an electric socket, you know that you will get clean alternating current at 110V. When you plug a PC into a network drop, you expect to get a TCP/IP connection. When you browse the Internet, you expect to receive HTML pages over HTTP protocol. These things have become ubiquitous services within their respective infrastructures. Access to integrated data must become a service as well to accommodate the ever-growing business needs that it fulfills.

To become a service within the infrastructure, data must be coupled with its meta data. When data comes with its descriptors, its context can be kept intact. The data dictionary derived from a performance-optimized database schema provides the field names and perhaps the primary and foreign key constraints, but it doesn't offer a logical view. Inevitably, a data professional tasked with integration spends an inordinate amount of time and effort on simple questions such as, "Should I concatenate FIRST_NAME and LAST_NAME fields from MAINT_CONTACT table, or should I use FULL_NAME from CONTACTS table?"

Model-driven data integration (MDDI) is a service-oriented data integration approach that incorporates and proactively utilizes meta data across the data integration process. By coupling data and meta data, MDDI drastically reduces the complexity and provides data integration that is aware of the context of the data.

Data integration as a service creates a continuum of meta data throughout the integration task. Data sources, targets and the integration flow are captured and collated; thus, you have a visualized answer for where each occurrence of pieces of data come from, how data is integrated and to whom it is delivered. Effectively, the know-how of the integration process is captured and documented.

A benefit of data integration as a service is a centralized audit trail of task execution metrics. Just as the phone company has records of all phone calls made in their coverage area, integration service providers capture detailed information on tasks that have been carried out across the infrastructure. This is an important fringe benefit in today's compliance-driven environment.

Leveraging the meta data about integration tasks and operational task execution metrics, MDDI creates "machine-generated, human-readable" documentation that allows consistent insight into data integration operations. To accommodate different organizational requirements within a company, MDDI-generated documentation provides varying levels of granularity. Through representation of consistent and complete information that is pertinent to different business functions, MDDI fosters effective collaboration across the enterprise. The systems administrator can look up how long it takes to move the data, the DBA can see which databases are affected and the business analyst can review the dataflow models to see what actually happens to the data.

MDDI must be standards-oriented. On one side, adhering to modeling and nomenclature standards, MDDI must ensure that the same meta data can be utilized by different systems within the infrastructure. On the other hand, MDDI must provide the needed flexibility and extensibility through leveraging standards-based technologies. Its adaptive nature reduces the impact on existing IT operations and lowers the associated overhead.

Analysts predict that a typical enterprise will devote 35 to 40 percent of its programming budget on programs with the sole purpose of transferring information between different databases and legacy systems. Access to integrated data is no longer nice to have - it is a necessity. Web services, XML and other standards-based technologies are maturing. Service-oriented architectures require service-oriented development of applications. Data, at the heart of information technology, must become service oriented as well.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access