IT and data professionals are under increased pressure to deliver the goods when it comes to data – that is, insights that can help drive business decisions and boost profits. Toward that goal, many organizations are focusing on collaborative analytics to empower analysts and business users to get their jobs done with greater accuracy and speed.
Stephanie McReynolds, vice president at Alation, spoke with Information Management about what these trends mean, and her observations on the top data analytics and data management themes to emerge from the recent Strata & Hadoop World event in San Jose.
Information Management: What are the most common themes that you are heard at the conference?
Stephanie McReynolds: Re-defining data governance in the context of Hadoop and big data use cases was a key theme of Strata San Jose.
Sessions highlighted how the maturation of Hadoop implementations in the market requires some tooling to support governed self-service. Sessions included our customer eBay, who spoke about tooling for 1,000s of analysts to more effectively find, understand and trust their data stored in Hadoop and Teradata.
Trifacta, Navigator and Waterline presented a demonstration of how a realistic data governance workflow could look like in Hadoop. And Joe Hellerstein shared an open source metadata management project called Ground. Discussions around data stewardship, governed self-service analytics, and metadata were very topical for the community of attendees.
IM: What are the most common challenges that attendees were facing with regard to data management and data analytics?
SM: One, that machine learning delivers the true business value of big data. Machine learning has emerged as the most likely way that every organization will derive value from data stored in HDFS. No matter which processing engine is used to prepare the data and execute queries, machine learning algorithms are where big data value is derived. Two, that broad-based analyst interaction with big data is important. Big data analysis should be a collaborative endeavor. Detailed knowledge about what the data means, how it was derived and the appropriate business uses sit in the heads of lots of different individuals in the organization. Finding ways to make big data more accessible to the entire team and encouraging data-driven discussions is a key to successful impact of a Hadoop implementation.
Three, that there’s still a gap in data science talent. Finding the unicorn talent that can write Python, understand machine learning and communicate effectively the results to the business is still close to impossible.
Scaling those individuals through greater productivity is key, and finding out how to make your SQL query writing analysts Hadoop-savvy is perhaps more realistic. Many attendees were looking for tools to help them write queries more quickly with assisted query-writing, natural language search, and the ability to register and share queries.
IM: What were the most surprising things that you heard regarding data management initiatives?
SM: The idea of data curation as a means to manage data at the point of consumption was a new trend embraced by attendees. Data curation encourages self-service users of data to share their knowledge and in the process naturally reveals the data that is most used in the organization, where users could be best served by the development of data governance policies.
IM: How do these themes and challenges relate to your company’s marketing strategy this year?
SM: Alation helps people find, understand and trust their data. Our data catalog is generated and updated using machine learning and improved through human collaboration and the input of subject matter experts. It automatically captures the rich context of enterprise data, including what the data describes, who has used it, and the fit between the data and different types of analysis.
Companies often search for the unified search, query, and curation tools offered by a data catalog without knowing that the category of 'data catalog' exists. When they speak to us, they're surprised to learn that there are tools available to provide the data accessibility, trust, and context they seek.
IM: What does your company view as the top data issues or challenges in 2016?
SM: As companies seek to be more data-driven -- to get the most value from their data assets -- we see a shift toward collaborative analytics to empower analysts and business users to get their jobs done with greater accuracy and speed.
In addition, as the importance of well-defined data governance policy grows, we see data curation emerging as a technique to manage data quality at the point of consumption in support of defined policies.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access