Developing 'anchors' for data – Ensuring everyone sees the same characteristics
All organizations have gradually adopted more than one visualization platform. Each business team chooses a platform of their choice for specific niche features and flexibility offered, such as natural language processing interface, availability on cloud, lesser technology team dependency, data wrangling support, social data support and for quick self service capability.
Often the business definitions and the metrics calculations are embedded into the semantic layer of a visualization platform such as Business Objects, Cognos or into a data set definition in platforms such as Tableau, SSRS. In organization that have multiple visualization platforms, this leads to the existence of varied definitions, and key enterprise-wide metrics also end up having different values due to the difference in calculations across the semantic layers.
The challenge is not about getting required data together in an integrated form for access, but in ensuring that integrated data which is made available for access is seen by everyone, and has similar characteristics in terms of name, definition and metrics calculations through whatever visualization tools the business teams have.
The technology team strives to maintain uniform definition, deliver similar data set view and ensure uniform metrics calculation value across the business teams. Whether the semantic / data set layer is created by the business or the technology team, there is a challenge to maintain the same definitions across the various visualization layers.
Data Anchor Layer
The data anchor layer ensures users from different business teams are anchored with the same data definition in business names as well as the metrics calculation that are defined as standards across the enterprise.
There are two ways by which a data anchor layer can be built. They are:
- Building a database object layer
- Building a data virtualization layer
Database Object Layer
In this method the business names and the metrics are defined as part of the database objects. We can have this done in two ways:
- Build the cube model based on platforms such as SSAS, Hyperion and TM1.
- Have all the business names and metrics defined in the cube model.
- Enable all visualization platforms access the cube.
Since most business intelligence platforms support cubes such as SSAS, Essbase, this method will ensure all of them see the same definition and metrics. There could be potential variations across platforms in terms of specific cube features that are supported. The key advantage will be better performance and as well having common hierarchy definitions.
Database Views and Aggregates
- Build views and aggregate tables.
- Have all business definitions, common key column names and aggregate metrics defined as part of views or as aggregate tables.
- All business metadata is at database level , all visualization platforms will see same definitions.
Unlike in the cube model, in the database views and aggregates’ method the table relationships, hierarchies and metrics calculation will be restricted in terms of completeness.
Both these options at the database object Layer in general will not require new additional software license. Some of the scenarios on how these methods are being leveraged:
- Data warehouse is based on Teradata, and all business definitions and aggregations are defined at the database level as views. Visualization platforms such as SSRS, Tableau and BIRST access the database views.
- Data warehouse based on SQL Server, and all business definitions and aggregations are defined at the database level and as SSAS cubes. Reporting tools such as Cognos, SSRS, PowerBI and the SAS data mining team access these cubes and database views for reporting. The Cognos Framework Model has a very simple just direct pull of metadata from the database layer.
Data Virtualization Layer
In this method the business names and the metrics are defined as part of the data virtualization model. This involves:
Building a virtualization model by pulling together the tables , and creating a virtual aggregate data structure with all business definitions and metrics.
Exposing the model as an SQL interface or as a service.
Enabling all visualization platforms access the exposed data structure.
In this method, additional software license will be required such as Composite or Informatica Data virtualization. Also similar to the database objects model, the calculation of metrics and on-the-fly aggregates can be performance and intensive exposure of table relationships and hierarchies will be restricted in terms of completeness. The key advantage will be that all visualization platforms will not have a challenge in accessing the data exposed.
Since larger organizations, especially financial institutions, are having a higher adoption rate of data lake and data governance platforms, the next step for the chief data officer and their team is to ensure a data anchor layer for the business to access the data.
Data governance platforms are repositories to hold definitions, quality metrics and lineage information, but they are not layers through which a visualization platform can pull data. That process can be aided with the creation of a data anchor layer.