It’s no secret that organizations are awash in data. In addition to creating greater amounts of data, they are also doing so from greater numbers of sources, making the challenge of managing it all that much greater.

For help, thousands of data and information technology professionals attended the recent Strata & Hadoop World conference in San Jose to get insights on how to better the process. Information Management spoke with Ben Hopkins, senior product marketing manager, big data, at Pentaho (a Hitachi Group Company) for his sense on what was most on their minds.


Information Management: What are the most common data challenges that attendees are facing?

Ben Hopkins: Overall, there seems to be a strong interest in figuring how to manage most or many phases of the enterprise analytics process in a coordinated fashion, ranging from central data processing and engineering to line of business data preparation and predictive analytics, to data delivery to business users. 

A main related goal is blending an array of complex/diverse data for analytics — whether the individual sources are a collection of relational databases, large swaths of machine data or data already living in Hadoop. 

For example, many attendees were interested in finding more manageable ways to parse and analyze semi-structured data like XML/JSON and log files, with the ultimate goal of extracting critical KPIs. 


IM.  How do these challenges relate to your company’s market strategy this year?

Hopkins: Pentaho is positioned to add value at every stage of the analytic data pipeline, including data integration and blending, as well as the delivery of analytics not just to business users but also to applications and processes.

We’ve been focused on providing flexibility and scalability at each stage of the cycle, but one challenge in particular that we’ve honed in on recently is helping organizations ease the process of complex big data onboarding. This is really a major first step toward end-to-end data processes. 

For instance, companies that need to migrate hundreds of relational database tables, ingest thousands of changing data sources into Hadoop, or even enable line of business users to onboard their own data into central systems face critical challenges related to manual, error-prone development and project timelines that are just too long. 

Pentaho has a critical ability to help automate these types of data onboarding processes, which has a lot to do with the ability to intelligently detect metadata from a variety of data sources and generate transformation and processing logic from that on the fly.  This can drastically reduce time to build out these processes.  We call it metadata injection.


IM. What does your company view as the top data issues in 2016?

Hopkins: It’s not a new challenge, but it has been compounded with the proliferation of big data sources, volumes, and complexity.  That is, we think the big challenge is establishing and maintaining full visibility, control, and governance over all data and analytics processes – from the raw source data all the way through business insights both internal and external to an organization.  

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access