With the incredible popularity of big data and Hadoop every Business Intelligence (BI) vendor wants to also be known as a "BI on Hadoop" vendor. But what they really can do is limited to a) querying HDFS data organized in HIVE tables using HiveQL or b) ingest any flat file into memory and analyze the data there.
Basically, to most of the BI vendors Hadoop is just another data source. Let's now see what qualifies a BI vendor as a "Native Hadoop BI Platform".
If we assume that all BI platforms have to have data extraction/integration, persistence, analytics and visualization layers, then "Native Hadoop/Spark BI Platforms" should be able to (ok, yes, I just had to add Spark)
◾Use Hadoop/Spark as the primary processing platform for MOST of the aforementioned functionality. The only exception is visualization layer which is not what Hadoop/Spark do.
◾Use distributed processing frameworks natively, such as
◾ Generation of MapReduce and/or Spark jobs
◾Management of distributed processing framework jobs by YARN, etc
◾Note, generating Hive or SparkSQL queries does not qualify
◾Do declarative work in the product’s main user interface interpreted and executed on Hadoop/Spark directly. Not via a "pass through" mode.
◾Natively support Apache Sentry and Apache Ranger security
Did I miss anything?
(About the author: Boris Evelson is vice president, principal analyst at Forrester Research servicing application development and delivery professionals. This post originally appeared on his Forrester Research blog, which can be viewed here)
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access