We regret to inform you that we will no longer be publishing Information Management. It has been an honor to provide you with the insights and connections to move your career forward. We wish you continued success on your professional journey and welcome you to explore our other titles at www.arizent.com/brands.

Bringing real-time data to the API economy with data virtualization

An application programming interface strategy lies at the heart of most companies’ digital transformations, no matter the industry or the size. APIs are the common connection points enabling applications to build on, extend, and work with each other. By exposing data beyond enterprise borders to partner companies, organizations can also take part in a growing, interconnected web of applications and a teeming ecosystem of development.

Most API initiatives are supported by an API management platform (sometimes also called API gateways). Among other functionalities, these platforms enable organizations to monetize their APIs, to manage connectivity and workloads, to monitor their usage, and to establish security policies. However, they do not deal with what probably is the most costly and difficult part of an API initiative: how to create the services that actually implement the API functionality.

One of the most important types of service in an API are the so-called data services, which expose integrated, curated data to consumers in other parts of the organization or in different organizations.

The typical approach to create data services starts by integrating and consolidating the required data in a new repository. This can be slow and costly because many data services need to combine and integrate data from multiple underlying repositories using heterogeneous data formats.

The alternative to physical consolidation is custom-coding data integration processes across several data sources in real-time, which is complex and can result in poor performance if not done properly.

In addition, the data in the new repository needs to be exposed using the dominant technologies for building data services, like REST or GraphQL. This requires creating custom code to allow client applications to define the filters and options they want to apply in the data. It’s usually also needs to comply to a number of standards in areas such as security (e.g. OAuth), data representation (e.g. JSON, XML) and documentation (e.g. OpenAPI).

All this results in implementation times that may be in the order of weeks or even months for new data services. Not to mention, reusing common data transformations and combinations across many data services is very difficult. This not only affects productivity; it can also create consistency problems.

Data virtualization is helping to solve these problems by allowing users to create logical views and integrate data across multiple data sources without needing to replicate the data in a new repository. Consumers (humans and/or applications) can access the data they need without worrying about it is physically located, or the native formats used by each underlying data source. Advanced automatic optimization techniques and sophisticated caching mechanisms ensure that data is delivered with the right performance.

By avoiding data replication, this approach can save time, reduce complexity, and potentially eliminate the risk for errors that might be introduced when data moves across multiple transformation servers. Logical views can also be easily reused, since they can be utilized as a starting point to create new ones.

In traditional usages of DV, the data in the logical views is accessed using SQL. Nevertheless, some DV systems also allow exposing the views as data services using technologies like REST, OData and GraphQL, without needing to create any code.

The data services can be secured out of the box using standards such as SAML or OAuth, and documentation can be automatically generated using standard formats (Open API). The data services created through data virtualization can also be easily containerized with technologies like Docker and Kubernetes, so they can be scaled up and down on demand.

How DV fits in with data service/API technologies is especially evident in the case of GraphQL, which is one of the most trending technologies in the API world today. GraphQL provides a query language on top of an API to allow client applications to obtain all the data they need in a single request, without needing to integrate the output of several endpoints.

This ability to offer a unified, declarative query functionality on top of multiple sources is perhaps the strongest point of DV systems, so it's no surprise that some of the most advanced DV systems in the market already offer the possibility of exposing a GraphQL API on top of any data.

However, data services are not the only ones to potentially benefit from data virtualization. Process-oriented data services which need to implement complex business logic can also use DV behind the scenes when they need to access and combine data. This is because DV removes the complexities of data integration while isolating changes in the underlying repositories.

Just as APIs represent a thriving ecosystem of development for application reuse and repurposing, data virtualization can fill the missing piece of the puzzle for agile real-time data integration.

For reprint and licensing requests for this article, click here.