A question I receive quite a bit from people who are newly introduced to NoSQL platforms and the overall concept of multi-structured data sets is the following:
Can you use NoSQL for business intelligence and analytics?
This is a perfectly acceptable query. You have returned from a conference on NoSQL or big data and you are looking for ways to utilize this newfound domain within your organization. Or better yet, you have been sent to do research by your boss who has returned from one of the previously mentioned conferences or who found a “shiny new toy” article in a technical magazine.
However, I have often found that the question is better redefined as the following:
Can you use a NoSQL data store to provide data to a business intelligence platform or an analytics system?
In fact, a more direct version of the question might be:
Can I connect a Hadoop HDFS instance or MongoDB store to my BusinessObjects or MicroStrategy implementation and have it work without problems, because my boss wants it to?
To be perfectly honest, these are two different types of questions, and I think that professionals pushing the envelope of their business should understand the difference(s). I will try to put some context around the concept.
Understanding the Question
If you are a fan of Douglas Adams and the “Hitchhiker’s Guide to the Galaxy,” you might know that a computer known as Deep Thought was tasked with finding the answer to the question “Life, the Universe and Everything.” Deep Thought’s answer was somewhat confounding to the people who received it when the computer replied: “I think the problem, to be quite honest with you, is that you've never actually known what the question is," said Deep Thought. "So once you know what the question actually is, you'll know what the answer means."
I am not saying that people new to NoSQL do not understand the core question of whether or not you can use NoSQL for BI and/or analytics. But there is a difference between using multi-structured data sets in association with your business intelligence and analytics practices and connecting a NoSQL data store to a BI platform based on SQL for data access.
Life, the Universe and Everything
The first question is about concepts and domains. You can most certainly use multi-structured data that is often stored within a NoSQL database for business intelligence and analytics. Further, there is no real reason why you cannot use document, graph or wide column data sets to augment structured data sets in your BI program or as the basis for analytics in your BI program. For charts and graphs based on “standard” analytical practices that I often like to refer to as “counting” (e.g., SUM, COUNT, MIN, MAX, AVG, GROUPBY), you will need to “squash” the multi-structured data into a relational structure or utilize some of the NoSQL access techniques that I spoke about last time via SQL. For advanced analytical practices, whether the data is structured or multi-structured, you often need to prepare the data for analytical models - so there is no reason that you cannot use multi-structured.
Six times Seven
The second question is about data access and platform integration. Many standard BI platforms that create dashboards or pixel perfect reports are based on the long-standing (and often long-cursed) SQL data access standard and utilize JDBC/ODBC drivers to make “plumbing,” or integration connections, between data stores and their platforms. SQL represents 25 odd years of development and standardization over its lifetime. And that standardization and development continues to this day with improvements for temporal processing, geolocation, etc.
That said, you would be hard-pressed to take a dashboarding/multi-dimenionsal analytics platform that relies heavily on the advancements in SQL development (such as IBM Cognos or Oracle Essbase) and “simply” connect it to a NoSQL data store without any issues. I am not saying that you couldn’t do this; connecting to HDFS via Hive and HQL and providing “just” simple queries to a dashboarding platform should (knock on wood) work just fine. You can do simple analytics such as SUM, COUNT, etc. However, if you ask a NoSQL platform via a less than SQL-92 conformant data access interface to provide deep levels of “cube” aggregation, the odds are that you will have issues with this technical integration and you won’t get the expected results.
What say the readers?
Do you see a place for NoSQL in BI/analytics in your organization? Do you have the ability to test multi-structured data sets within BI without connecting MongoDB to Cognos? Have you connected Hive to BusinessObjects with success? Do you know the answer to “Life, the Universe and Everything”? (HINT: it is mentioned above.)
Provide your comments below and/or ping me via Twitter at @JohnLMyers44 with the hashtag #noodlingNoSQL.