Improving the bottom line, cutting costs and reducing risk by providing better access to information assets is a proven, yet hard to execute, IT strategy.
The challenges to execution are many. Most relate to the need to rationalize and leverage the large volumes of complex, diverse data spread across a wide landscape of application silos and fit-for-purpose data stores. Each of these sources has its own schema and syntax. Few are structured properly for consumption by other applications. Many are incomplete, duplicated – or both.
Data abstraction, because it provides consuming applications with a consistent, bound data schema across this universe of diverse sources, is critical to success. The keys to successful data abstraction are a rationalized, reference architecture and a well-organized team.
When applied to a variety of information delivery problems, data abstraction simplifies the achievement of a number of key business and IT objectives. For example, data abstraction has been shown to accomplish the following:
- Deliver new business information sooner. The time required to fulfill new information needs can be shortened because application developers understand new data sources more quickly.
- Align business and IT models. Hiding data complexity, structure and location issues within its logical business or canonical models, data abstraction can improve specification and solution delivery, while cutting solution development time and costs.
- Insulate business and IT changes. Abstraction insulates consuming applications from source changes and insulates data sources from changing consuming applications. Developers are free to build their applications using more stable data views and services. This also allows ongoing changes and relocation of physical data sources without impacting consumers.
- End-to-end governance and control. Developers, modelers, database administrators, data stewards and more can align themselves around a common approach and unified schema, from source to consumer.
- More secure data. Data security methods and controls are applied consistently across all data sources and consumers.
Data Abstraction Reference Architecture
The data abstraction reference architecture described below is a useful way to enable data abstraction success. It is a multi-layer implementation that breaks down complexity into more easily understood and developed components, each attempting to resolve only part of the overall problem. These components are flexible, and they can be built as an enterprise-wide data abstraction layer, or they can be built and deployed as needed, based on actual projects. When combined together, they support enterprise-wide data abstraction (see Figure 1 below).
- Data consumers: Consuming applications retrieve data in various formats and protocols that they understand including: Web services, REST, JDBC, Java clients and more.
- Mapping layer: This layer maps the business layer to the format in which each data consumer wants to see the data. It might mean formatting into XML for Web services or creating relational views with different alias names that match how consumers prefer to see their data.
- Business layer: Predicated on the practice that the business has a standard or canonical way to describe key business entities such as customers and products, the business layer contains the definitions for a set of “logical” or “canonical” views that represent these business entities. In the financial industry, for example, one often accesses information according to financial instruments and issuers. Typically, a data modeler works with business subject matter experts and data providers to define this set of views, which become reusable components that can and should be used across business lines by multiple consumers.
- Formatting layer: Here physical data sources are integrated into the overall abstraction layer. Activities include name aliasing, value formatting, data type casting, derived columns and light data quality mapping. In general, this layer is derived from the physical data sources, performing one-to-one mapping between the physical source attributes and their corresponding “logical/canonical” attribute names. Naming conventions are very important and are introduced in this layer.
- Physical layer: This layer in effect mirrors the physical data sources. It is used as way to onboard the metadata required for data abstraction. As an “as-is” layer, entity names and attributes are never changed.
- Data sources: As the physical information assets that exist across an extended enterprise, these assets may be databases, NoSQL data stores, packaged applications such as SAP, Web services, Excel spreadsheets and more.
Data Abstraction Roles and Responsibilities
Implementing an enterprise-scale data abstraction layer involves a variety of IT staff, each playing an important role. These include (see Figure 2 below):
- Application developers specify the application programming interfaces and the various mechanisms by which the APIs can access the data from the data abstraction layer.
- Enterprise data modelers and data architects work with subject matter experts to craft logical data models. These logical data models can be further refined into something that is closer to a logical database design to be used as the basis for building views in the business layer. Enterprise data modelers work with data virtualization platform developers to design effective views and services.
- Data virtualization platform developers are responsible for implementing the various data abstraction layers using data virtualization middleware. This implementation is based on the overall architecture and concepts provided by the reference architecture and performed in conjunction with the other teams involved. To construct the services and views properly, these developers need to work with the application developers with regard to the functional and protocol requirements, plus the database administrators with regard to the underlying sources.
- Database administrators and database modelers provide access to the physical data sources. Further, they help define logical database designs from their corresponding physical database designs and implementations. They work with data virtualization platform developers to ensure views and procedures are properly tuned. They also assist with creating necessary indexes.
Data governance teams are involved throughout the process to ensure that enterprise data access and data mapping rules are followed. Further, they provide guidance on how data should be modeled and often help to resolve data quality issues.
Practical Next Steps
Enterprises can achieve data abstraction’s benefits by taking the following simple steps. The key to starting quickly is to select manageable projects that enable both immediate learning and build a foundation for long-term progress.
- Select a data virtualization platform. Look for a supplier with proven customers and deep domain experts with understanding of how reference architecture, roles and technology combine to achieve data abstraction success.
- Set achievable goals. Start with projects and a focused team. With each success, broaden business and IT team involvement to expand usage across departments toward the ultimate enterprise-level deployment.
- Determine levels of abstraction. Are the four recommended layers sufficient or should additional layers be added for greater depth?
- Determine modeling and mapping approach. Should a top-down or bottom-up approach be adopted? Or should a combination of the two be used to achieve maximum success?
- Start now! Don’t overanalyze. Getting started with small steps is the best way to learn, progress and gain value.
- Continuously improve. Data abstraction is a journey that fosters learning and improvement along the way.
Although there are many challenges to improving access to business-critical information assets, IT plays an important role in the organization’s ultimate success. Strategic IT teams begin by mapping out the reference architecture for data abstraction, working on manageable projects that not only reveal the data assets, but also become the building blocks to form the enterprise-wide data architecture for both near-term and long-term success.
In addition to this technology framework, equal attention is paid to the roles and responsibilities within the IT team. The result is improved access to information that is adaptable to the changes inevitable in today’s business climate.