The Danger of Data Silos

Register now

The CEO of a Fortune 500 company recently asserted that his company's enterprise systems run hundreds of applications doing exactly the same thing with only slight variations. And in truth, many organizations have customer relationship management (CRM), enterprise resource planning (ERP) and corporate performance management (CPM) applications that each work independently and run off of multiple relational databases. These "data silos" are unable to share common types of information, which creates inconsistencies across multiple enterprise applications and leads to poor business decisions.Implementing a service-oriented architecture (SOA) enables internal and external business applications to communicate while sharing services and features, resulting in reduced IT costs and a more integrated enterprise infrastructure. Developing an SOA successfully involves many incremental changes, but one of the most important changes is often overlooked - a data services infrastructure that integrates and manages information from data silos created by legacy systems. Provide an SOA with a data services infrastructure and it will maintain data consistency through synchronization, data agility through semantic mapping, and data access optimization through caching.

Transactional Synchronization for Consistency

Heterogeneous data is usually generated by a combination of applications employed by enterprises to perform business processes. Legacy systems utilize multiple relational databases to provide applications with information and to manage data. However, a one-way distribution model between relational databases and applications creates data silos and results in the need to update the entire system "every night at midnight." This cumbersome process worked in the past because there was no alternative, but entrenching your business in legacy systems today is like investing in the stock market using quotes from last week's newspaper.

An SOA changes the distribution model by using XML messages that move between applications and the data source. Whenever a service requires or creates additional data, it is simply added to the document as part of the process. One of the side effects of this model is that as each service does its part in the process, it typically adds a little more data to the document. This may not be a problem for small systems in the initial phases of adopting an SOA, but as systems and processes become more complex, these documents can grow dramatically, making data management more complex.

The other problem with passing data through XML process documents is that data does not come from a consistent point in time. Data from early in the process can become old and out of date, conflicting with data added later in the process. The inconsistency of the data used in a business process can make it difficult, if not impossible, to make valid decisions based on the information that is being passed between the applications.

A data services infrastructure addresses these issues by establishing shared data services, which provide the free flow of valuable information across multiple relational databases and departmental applications. For data that is frequently changed during the course of a business process yet must remain current throughout that process, shared data services can provide a level of mediation and consistency through transactional synchronization.

Transactional synchronization guarantees the accuracy of information by managing the relationships between data classes and applications to ensure that each is aware of all data changes in real time. The benefits of shared data services include flexibility, re-usability and loose coupling of business data. Because the applications are related, they share a common data model and common data, which ensures each server and client application has up-to-date information to enable more accurate business decisions.

Semantic Mapping for Agility

Data silos also rear their ugly heads when relational database management intersects with object-oriented design. A key feature of object-oriented design is encapsulation, which involves hiding information from design decisions to keep processes stable. While encapsulation serves its purpose in object-oriented design, hiding information from relational databases results in an object-relational impedance mismatch. The mismatch between data representation and application processes is exacerbated by disparate languages and differences of scale. Relational databases operate with rigid columns and rows, while object-oriented design imposes fewer rules on developers. Additionally, the private nature of data in object-oriented design makes the smallest transactions from relational databases larger than anything the object can handle. It is a problem of square pegs and round holes.

In an SOA, object-relational mapping transforms the impedance mismatch by translating the messages through object stores. This requires software to work under two separate paradigms - objects are required to process data and relational databases are used for storage. The solution minimizes coding low-level infrastructure, supports object-oriented models and addresses the differences in scale between objects and relational databases by automatically mapping requests between the two. Virtual libraries of schema provide lists of tables in the database and objects in the applications. The best solutions allow the object model to impose as few constraints as possible on the relational schema and vice-versa.

One of the benefits of a service-oriented approach is the agility it brings to the development process. By mapping individual services, new processes can be developed from components. As these services evolve, so will their requirements for data. In order to facilitate this kind of agility at the data level, there is a need for semantic mapping between the data sources that manage the data and the services that consume it.

Basic mapping technology, such as XSLT or Java, can transform the exchanges between data services, but as the adoption of SOA increases, these simple mechanisms induce granularity as each new service model added must be mapped to all of the other services with which it may need to interact. Granularity requires the process to become even more agile, which puts more pressure on the data services layer. The problem of mapping many data services to many others grows exponentially and quickly becomes unwieldy.

Semantic mapping provides a common model approach to data services by integrating concepts of schema. Consider semantic mapping akin to the Linnaean taxonomy in science, which provides a naming hierarchy of genus and species to differentiate similar animals, such as German shepherds and Labrador retrievers.

The focus of semantic mapping is for data services to consider the meaning of information instead of getting hung up on the structure of the data. Note that this common model need not be the format used for exchanging information between services, and in fact, it may not be materialized in the process at all. The important aspect is the power it provides in simplifying the mapping effort. Utilizing a single map or hierarchy as the sole source of data truth liberates developers from coding low-level infrastructure and automatically ensures compliance between relational databases and objects.

Cache Synchronization for Access Optimization

The final danger of data silos is that stale data can manifest from relational databases, which were not designed for high volumes of enterprise system traffic. Relational databases only provide a limited number of connections, creating bottlenecks that block incoming requests. The integrity of data comes into question when multiple applications demand access to data at the same time.

Optimization is another important area in which data services can facilitate a more effective implementation of an SOA. One of the numerous benefits of a service-oriented approach is the ability to reuse services. Often services are even constructed for just this purpose. A common pattern in adopting an SOA is to begin by identifying some specific business functions that are applied across a number of existing business processes. These services are then pulled out and integrated into separate services to be reused in a consistent way across all of these systems.

This can be a great benefit, but it carries with it some risks and challenges. These common services can become bottlenecks in the business processes. Because many processes are now relying on these central services, the load on them can increase dramatically. Therefore, they have to be built for high performance and scalability.

These common services can also become single points of failure. In the event that they do fail, the many business processes that rely on them will come to a halt as well. In addition, if the service remains up and running but becomes disconnected from the services that are relying on it, it can still result in undesirable process failures.

With the introduction of caching capabilities in a data services layer, these challenges can be addressed. To optimize performance and scalability, a distributed, in-memory cache of data can dramatically improve the services' ability to handle large volumes of requests. Likewise, redundant copies of the services utilizing a distributed caching layer can be used to eliminate the risk of the service becoming a single point of failure. Even network partitioning can be accounted for if the caching layer is smart enough to support continuous operation with disconnected capabilities and durable state management, even when the service cannot directly contact the source of its data.

H. G. Wells once said, "Adapt or perish, now as ever, is Nature's inexorable imperative." By combining a data services infrastructure with an SOA, businesses are prepared to develop, deploy, integrate and manage their databases against the flux of new processes created by business changes and regulations. With capabilities for synchronization, mapping and caching, data services can address many of the challenges presented when developing and evolving a service-oriented approach to business applications. With this kind of capable data management infrastructure, it will be possible to create agile, optimized services that can provide a high degree of consistency to the business processes that rely on them. If businesses can initiate best practices such as these, then they will see their data silos transform from archaic processes into the fruits of their labor.

For reprint and licensing requests for this article, click here.