BI’s ‘semantic layer’: After 25 years, it’s time for a change
Engineers like to use abstractions to simplify complex concepts – making detail-rich notions more palatable in the real world. Abstractions lower complexity and conserve our scarce brain power by allowing us to consider systems conceptually and not molecularly.
In business, abstractions are similarly useful. Business people deal with complex models incorporating lots of details – each of which is important both by itself and as part of the system as a whole. Using abstraction, businesses grasp and understand contextual meaning.
In a fast-moving business environment, abstractions also enable business people to function independently of IT – without constant back-and-forth requests for reports yet also without costly mistakes due to overlooked details. Once these details are abstracted in business logic, non-technical staff can do their jobs better - avoiding the high-cost yet subtle inaccuracies that can translate into significant impact on the bottom line.
In the realm of business intelligence (BI), the semantic layer is the key abstraction used in most implementations. The word “semantic” in the data context means simply “from the user’s point of view.” It’s an elegant solution to a problem of potentially unbounded complexity. And it’s not a new concept – in fact, it’s been around since 1991 when it was patented by Business Objects (and later successfully challenged by Microstrategy).
It’s time for the semantic layer to evolve
All great ideas evolve, and you would expect the semantic layer to be no exception. Yet, surprisingly, the semantic layer concept introduced nearly a quarter century ago hasn’t changed much.
This is a problem, because the semantic layer has the power to be a true BI force multiplier, and modern BI is not fully leveraging it. Owing to the limitations of traditional technology, and lack of in-depth understanding of underlying data structure and calculations, the semantic layer has become the domain of operational reports, dashboards, and business analysis. Yet it has the potential to achieve much more, especially given the data challenges that have arisen over the past decade, which include:
Far more data is being generated today than ever before. Most US companies have over 100 terabytes already, and in 2020 projections show 20 zettabytes being created. With an ever-expanding data pool, a more effective semantic layer together with aggregates will result in lower maintenance (of ETL pipelines, for example) and faster analysis.
As data comes in faster, outdated semantic layer approaches with a static build phase can’t keep up. A more dynamic and adaptive approach is needed – one that can automatically map from source to aggregate, for example, and perform incremental updates.
More diverse types of data are introduced daily. Today’s semantic layer needs to make this semi-structured data (think JSON and key-value pairs) look structured - relational behind its abstractions. This will enable business users to use standard visualization tools like Tableau, Excel, Qlik and others which have no knowledge of underlying formats. This will eliminate the need for complex, maintenance-intensive, repetitive data movement or ETL. Moreover, there will be no need to retrain end-users on a new visualization UX.
In many instances, business users have lost confidence in the veracity of the data they use to make crucial decisions. A modern semantic layer needs to offer abstractions based on proven, tested structures and calculations thus raising user trust.
The upside of the evolved semantic layer
The new semantic layer doesn’t represent a single abstraction. Rather, it should group abstractions that address varying challenges. This allows for significant improvements in:
Acceptance, usability, understandability
Business users don’t want to rely on IT to build or alter reports. The new semantic layer needs to offer agile tooling that helps users understand the impact of modifying a query, without involving IT yet also without lowering confidence in the results.
Security and governance
Legacy semantic layers can’t track the lineage of data from row level to aggregate. Yet today’s tightly regulated enterprise needs to know exactly who saw which data and when in the tightly-secured data lake.
Performance and scale
Despite introducing the tabular model to overcome performance issues, Microsoft actually added to complexity. The new semantic layer needs to offer vastly enhanced performance whether underlying structure is a snowflake, a star or even pure OLTP. This will lower the necessity for ETL and data wrangling, while raising time to insight.
The legacy semantic layer required a lengthy build process - from minutes for small datasets, to days or even weeks for large datasets. This raised latency to levels that don’t align with the speed of modern enterprise decision-making. The new semantic layer enables querying of new data within minutes of its arrival in the data warehouse – with no full-build, full-rebuild, or manual ETL processes to slow it down. Metadata, including data lineage, enables mapping from source fact tables to any temporary downstream data structures updated incrementally to improve performance.
Single version of the truth
A modern semantic layer needs to be able to create multiple, complex SQL statements in response to simplified user requests. This means it needs to be able to handle database loops, complex objects, complex sets (union, intersection), aggregate table navigation and join shortcuts. By applying rules to define database complexity and ambiguity, this SQL generation ensures that if two users ask for the same information, they get the same results – defining a single version of the truth.
Why not data discovery?
Previous generations of semantic layer tools were so user-unfriendly that they essentially crippled BI development. Since these solutions couldn’t deliver on usability, complexity and fast IT turnaround times - the market shifted to embrace “data discovery” tools like Tableau, Excel and Qlik.
These tools weren’t conceived to replace a centralized semantic layer, but rather to circumvent it altogether. By simplifying connections to relational database sources, these tools help business users build queries and sidestep the older fixed-data warehouse model. By distributing and duplicating logic, data discovery tools facilitated the creation of complex personal models that include combinations of data from the warehouse and other systems.
However, with the changes in data discussed above, the data discovery model ran into trouble. When hub-and-spoke models take center stage, data management becomes a major problem.
Data discovery tools have their place, but they were designed for discovery, not core BI. They came about because sophisticated users needed freedom and power that couldn’t be constrained by corporate IT. But the new semantic layer changes all this.
The semantic layer – grassroots demand
The shift to data discovery tools created a groundswell of demand for what is, de facto, a new semantic layer. End users of Tableau, Qlik, Excel and other frontends have already created a provisional new semantic layer - which increases adoption, time to analysis and correctness for established and shared models by pushing down the definitional layer.
The challenge is that current BI tools focus on user flexibility at the expense of governance. Data extracts fed into BI tools are the labor-intensive fruit of manually-written and maintained complex ETL pipelines. The new, evolved semantic layer – whose groundwork is already laid – is a hybrid model that puts power in the hands of the user, and control, security and trust in the hands of IT.