The Forrester Muse
for Information Management Blogs
SEP 25, 2012 2:24pm ET

Blogroll

blog

BI Integration with Hadoop: What Does It Mean?

Print
Reprints
Email

There's certainly a lot of hype out there about big data. As I previously wrote some of it is indeed hype, but there are still many legitimate big data cases - I saw a great example during my last business trip.

Hadoop certainly plays a key role in the big data revolution, so all business intelligence (BI) vendors are jumping on the bandwagon and saying that they integrate with Hadoop. But what does that really mean? First of all, Hadoop is not a single entity; it's a conglomeration of multiple projects, each addressing a certain niche within the Hadoop ecosystem, such as data access, data integration, DBMS, system management, reporting, analytics, data exploration, and much, much more. To lift the veil of hype, I recommend that you ask your BI vendors the following questions:

  • Which specific Hadoop projects do you integrate with (HDFS, Hive, HBase, Pig, Sqoop, and many others)?
  • Do you work with the community edition software or with commercial distributions from MapR, EMC/Greenplum, Hortonworks, or Cloudera? Have these vendors certified your Hadoop implementations?
  • Are you querying Hadoop data directly from your BI tools (reports, dashboards) or are you ingesting Hadoop data into your own DBMS? If the latter:
    a) Are you selecting Hadoop result sets using Hive?
    b) Are you ingesting Hadoop data using Sqoop?
    c) Is your ETL generating and pushing down Map Reduce jobs to Hadoop? Are you generating Pig scripts?
  • Are you querying Hadoop data via SQL?
    a) If yes, who provides relational structures? Hive? If Hive,
    b) Who translates HiveQL to SQL?
    c) Who provides transactional controls like multiphase commits and others?
  • Do you need Hive to provide relational structures, or can you query HDFS data directly?
  • Are you querying Hadoop data via MDX? If yes, please let me know what tools are you using, as I am not aware of any.
  • Can you access NoSQL Hadoop data? Which NoSQL DBMS? HBase, Casandra? Since your queries are mostly based on SQL or MDX, how do you access these key value stores? If yes, please let me know what use cases you have for BI using NoSQL, as I am not aware of any. 
  • Do you have a capability to explore HDFS data without a data model? We call this discovery, exploration.
  • As Hadoop MapReduce jobs are running, who provides job controls? Do you integrate with Hadoop Oozie, Ambari, Chukwa, Zookeeper?
  • Can you join Hadoop data with other relational or multidimensional data in federated queries? Is it a pass-through federation? Or do you persist the results? Where? In memory? In Hadoop? In your own server?

As you can see, you really need to peel back a few layers of the onion before you can confirm that your BI vendor REALLY integrates with Hadoop.

Curious to hear from our readers if I missed anything.

This blog originally appeared at Forrester Research.

Advertisement

Comments (0)

Be the first to comment on this post using the section below.

Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.

Blog Archive for Boris Evelson

Why Don’t We Measure BI Performance?
Looking for BI Talent? You’re Not Alone
Evaluating BI for Cloud and Mobile
Help Wanted: Defining a Business Intelligence Leader
Searching for Measures of Business Intelligence Performance

More from Boris Evelson »

Blog Index »

Where do young IT professionals (30 and under) obtain information to aid with daily role responsibilities and career development?

Trade publication websites 14%
Social media 23%
Vendor websites 4%
Vendor/community forums 7%
Newsletters 1%
Trade conferences/meetups 2%
RSS feeds 6%
Web search 44%

 

Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.