Data Pros Spend Bulk of Their Time ‘Wrangling’ Data, Not Analyzing It

Issues around data preparation and data quality continue to plague many data professionals, who spend as much as 80 percent of their time trying to make raw data usable. That is the take of David Corrigan, chief marketing officer and vice president of product management at InfoTrellis.

Corrigan spoke with Information Management at the recent Strata & Hadoop World conference in New York. Corrigan said the data lake has become especially important for many organizations, but getting value from a data lake is easier said than done.


Information Management: What are the most common themes that you heard among conference participants?

David Corrigan: “There were a few common themes from the conference attendees with which we met. The most common one was making their data lakes useful – in particular, how to free up data scientists to spend more time analyzing data and less time fixing it. Many attendees believe their data scientists spend upwards of 80 percent of their time wrestling with data issues.

“In particular, the attendees were interested in customer data – which was the most common data type they wanted to get from their data lake. They wanted to build a customer 360 from their vast data lakes and make it actionable for business applications and users.

“Related to that, the topics of governance and quality also arose – as they moved from analyzing data in lakes to potentially operationalizing it, they were more concerned with both of those issues.”


IM: What are the most common data challenges that attendees are facing?

Corrigan: “With respect to those technology sets (existing products), the biggest challenge I heard reported was the inability to effectively deal with big data (large data sets, semi and unstructured data). Many attendees have a strong desire to master data, build a customer 360, and govern data from big data sources (really all data), but they were looking for new technologies to do that and augment their structured data MDM and governance systems.

“Many attendees spoke of the need for ‘relative’ data governance – or to understand the confidence levels in various data sources that build the customer 360, and to use them for different purposes (e.g., marketing campaign planning vs. web self-service) based on the confidence level in each data attribute.”


IM: What are the most surprising things that you heard?

Corrigan: “Many attendees said that governance and MDM (quality, matching customers) was being done ‘by hand’ in the data lake – so they had built some rudimentary quality checks as a first pass. They did so because existing tools were built for structuted data and they didn’t believe it could meet their needs.

“InfoTrellis AllSight appealed to those clients, as it provided a product that could take unstructured big data from their lake or elsewhere, and build and govern a complete Customer 360.”


IM: What does your company view as the top data issues or challenges in 2016?

Corrigan: “The primary issue is that two of the four MDM use cases created by Gartner (Consolidation and Registry style MDM) have fundamentally changed. The purpose of both is to give a view of all customer data, and the ground has shifted in the 10 years since they were created because most customer data is now unstructured.

“Traditional MDM systems seem best suited to hybrid or transaction style MDM of core structured data, whereas new technologies seem better suited to consolidation or registry style MDM. InfoTrellis AllSight can manage consolidation and registry style customer MDM requirements. “


IM: How do these themes and challenges relate to our company’s market strategy this year?

Corrigan: “InfoTrellis will market AllSight to augment existing MDM implementations – many existing MDM customers are looking to augment MDM with a full unstructured ‘data store’ or to build an ‘enterprise customer 360’. Many of those organizations were planning to custom-build extensions to MDM with a big data/Hadoop platform.

“We also plan to market AllSight to organizations beginning their MDM journey as a great starting point to consolidate all customer data and provide a full 360 view to business applications.”

For reprint and licensing requests for this article, click here.