6 trends impacting analytics and business intelligence practices
The democratization of data and collaborative analytics are the future of business intelligence and data analytics in the enterprise. The 2020 landscape will be dominated by the rise of Extract, Load, Transform (ELT), the increased traction of cloud data warehouses, and the spreadsheet renaissance.
Here are predictions on what we can expect in 2020:
JSON and semi-structured data have gone mainstream, making even more data available for analysis
JSON has become the file format for transferring data on the web. Lightweight and readable by humans and machines. When any two applications communicate with each other across the internet, the odds are they are doing so using JSON. Every programming language can parse JSON.
Eighty percent of the data in an organization is unstructured or semi-structured, like the data in emails, documents, Wiki pages etc. Semi-structured data is pouring in from apps, websites, mobile devices, IoT devices and sensors. This semi-structured data used to be more difficult to search, manage, and analyze. And previously you needed multiple on-premises storage systems to house both sets of data.
However, cloud data warehouses, like Snowflake, allow loading semi-structured data into a column in a table and then access it natively that allows you to load semi-structured data into a column in a table, and then access it, natively, with some minor extensions to SQL. This saves people from having to parse it out and, basically, extracts, loads, and transforms the data into traditional tables and columns.
Now structured and semi-structured data can be managed in the same system, and CDWs are making it easier to store, analyze, and consume, enabling organizations to get the full value out of all their data.
ELT is the wave of the future
There is an increasing desire to transform data as late in the game as possible. Prior to the development of ELT, the transform phase would happen as the data was moved to the data warehouse. This used to be necessary because storage and compute in the data warehouse were expensive.
The challenge this presents though is: you were out of luck if the transform removed or changed something needed for analysis because the raw data never reached the warehouse. The emergence of the cloud data warehouse made loading all the source data in its raw form possible and thereby, delaying the transform and ensuring no data is lost during the transform.
The re-emergence of the spreadsheet (interface)
Spreadsheets, in paper form, have been around for hundreds, maybe even thousands of years, and in digital form since the 1970s, starting with Visicalc. There have been all sorts of attempts to kill or replace the spreadsheet. We’ve tried all kinds of interfaces, but the one interface that allows people to ask questions, iterate, and collaborate with data remains the spreadsheet.
We are in the midst of a spreadsheet renaissance with many recent apps, like Smartsheet and Airtable, finding inspiration in the tried-and-true worksheet. This familiar interface offers power to the many smart, curious people in the world. It provides the possibility of bottom-up power. And it isn’t going anywhere.
The traction cloud data warehouses (CDWs) are experiencing will accelerate and Structured Query Language (SQL) will continue to be the dominant query language
CDWs have taken off for a number of reasons. These include scalability, flexibility, lower costs and connectivity. By storing all data in one place, organizations don’t have to deal with the complexity of searching various discrete business systems and data stores to locate the relevant data.
There’s also significant cost savings. According to Amazon, to run a data warehouse on your own would cost between $19k - 25k per terabyte per year. On average companies see a 96% savings switching to a CDW.
Data also expires, becomes irrelevant or is replaced. It’s estimated that 60% of corporate data has lost some, or even all of its business, legal, or regulatory value. A CDW greatly accelerates your time to access, analyze, and leverage data before its stale.
SQL was invented 45 years ago. Over time there have been many efforts to replace it. Some people thought of it as a relic that couldn’t scale, which led to the rise of NoSQL: Bigtable, Cassandra, MongoDB, and more. But the cracks in these databases have begun to show in the form of security, scalability, data consistency, and standardization challenges.
Some databases added their own “SQL-like” query languages. But there's been a major return to SQL because it’s still the best language for interacting and performing on databases. It’s superior in structure, compatibility, and because of its massive community backing, there’s been a lot of focus on security and efficiency for SQL systems. According to Stack Overflow, it’s the 3rd most popular technology.
Bonus: One of the most powerful and impactful capabilities of the CDW is the ability to just dump raw data and analyze it right in place. When you pair that capability with the accessibility and simplicity of SQL, it’s easy for organizations to make any updates, changes, or deletions required under GDPR and other privacy laws.
Augmented analytics hold promise, but humans still need to be kept in the loop
We have all of this data at our disposal to analyze, and people are excited about automating the formation of insights from this data. Augmented analytics is a process that combines AI and ML protocols to change the way analytical data is shared, generated, and processed.
The goal is to surface critical insights that take less time, less skill, and have less bias. As with most technology, there’s drive to eliminate humans, or at least remove them as much as possible, and have AI do most of the heavy lifting, or create completely autonomous systems. But AI isn’t and will never be perfect. You actually do lose a lot when you take humans out of the picture, and surprisingly you do encounter the same problems, like bias (it actually becomes compounded over time), when you rely solely on AI.
There’s this concept in machine learning, ‘human in the loop.’ This means that when teaching AI how to do things, you inevitably encounter edge cases, or situations that are too close to call. In these moments, you need a human in the loop to make that judgement call. Humans are better at that than computers, and that, in turn, helps the AI system improve its judgement making ability.
It’s an attractive proposition to have a fully automated business intelligence system, automatically surfacing insights. But humans are smart, we can perceive things that are not easily quantified. We can draw connections between events that are not obvious. We have tacit knowledge. We can’t underestimate the human experience and the human brain.
Dr. Lance Eliot, CEO at Techbrium Inc and executive director of the Cybernetic AI Self-Driving Car Institute, wrote a great piece that talked about the Viking Sky cruise ship accident that happened earlier this year and how having humans in the loop could have prevented it. The rough seas caused the oil-level sensors to be triggered, alerting that the amount of oil was dangerously low, nearly non-existent.
It makes sense if there was no oil to shut down engines - so this was an automated behavior. But there was oil. The sea was just extremely choppy, but the system cut the engines, turning the ship into a bobbing cork. No humans were alerted to this before it happened. Hundreds of passengers and crew had to be airlifted in a very dangerous operation.
A human could have easily reasoned that the rough seas were responsible for the low oil reading, and a human could have steered the ship to a safer place before cutting the engine. Humans make mistakes as well. But humans do add unique capabilities and context to the decision making process.
- Humans add additional intelligence to the process.
- Humans introduce emotion or compassion into the process.
- Humans can detect and mitigate runaway automation.
- Humans can detect and overcome nonsensical automation.
- Humans shore-up automation gaps.
- Humans provide guidance to automation.
So really, the best systems are partnerships. You can’t and shouldn’t completely remove humans from the equation. If anything, the best systems amplify human strengths and bring more humans into the decision making.
Self-service business intelligence is no longer a ‘nice to have,’ it’s absolutely essential in the complex fast moving era we live in
Self-service analytics and business intelligence have been the holy grail. Especially enabling domain experts to perform queries and run reports with as little IT support as possible. Many organisations claim to have it, but what they really have are dashboards. A set of prepared analytics that provide very little opportunity to dive further, to ask deeper questions. And as every department becomes more data driven, this becomes very limiting.
Take marketing, for example. Marketers have moved beyond operating off of hunches and gut feelings and are clamoring to be data driven. More often than not, marketers are only looking at pre-built dashboards in individual apps with data from the last 90 days. The questions this interface can answer are limited and tactical.
Effective, holistic strategies require the ability to see trends over time or see how multiple things are impacting each other. This kind of analysis requires access to large volumes of data from multiple sources. It goes way beyond a simple dashboard. And this idea of self-service can extend to data pipelines themselves.