Slideshow 8 top vendors for self-service data preparation products

  • July 26 2018, 6:49am EDT
12 Images Total

8 top vendors for self-service data preparation products

The growing popularity of self-service analytics tools has resulted in a more flexible, iterative approach to data preparation, according to the new report “Ovum Decision Matrix: Selecting a Self-Service Data Prep Solution, 2018–19.” Report author Paige Bartley, senior analyst for data and enterprise intelligence, says “The number of data consumers in the typical enterprise has grown massively over time, and over the last decade, business intelligence and visualization tools have increasingly added in guided functionality aimed at further expanding the scope of the user base to nontechnical business users.” Bartley offers his thoughts on eight of the top products in this space.


“The leading vendors, regardless of their architectural approach, are typically notable for their high scores in the data governance and the collaboration and machine learning categories in the technology features assessment,” Bartley explains. “Additionally, they edged out others in technology categories such as data manipulation, which had closely clustered scoring. For execution categories, market leaders had solutions that tended to score better on maturity and deployment.”

Content Continues Below

ClearStory Data

“ClearStory Data provides an end-to-end platform to allow business users to engage in automated data inference, automated data prep and automated data harmonization,” Bartley says. “With its intuitive visual user interface and smart, machine learning-driven recommendations for joining data, general business users (not just analysts) can get up and running with little training or guidance. It additionally includes native visualization capabilities, via its StoryBoards interface, allowing users to explore and discover insights by interacting directly with data.”


“Datameer is a broad and complex platform, originally built on implementations of Hadoop but now native in the cloud, that is focused on building and managing the data pipelines that enable data to be fed into any analytic tool,” Bartley explains. “The current platform performs ingestion, integration, prep, enrichment, exploration, and some visualization. It is well-suited to IT ecosystems that are complex and highscale. With connectors to more than 70 data sources beyond the Hadoop ecosystem, it provides immense flexibility and connectivity, allowing it to be a central hub for data prep and staging before data is sent to an analytics tool.”


“Trifacta offers a standalone approach to data prep, fixated on best-in-breed functionality,” Bartley explains. “The company's focus is purpose-built data prep independent of, and interoperable with, a broad variety of BI, data science, governance, and storage and processing environments, both on premises and in the cloud. Its approach is based on a robust strategy of integrations and partnerships. Trifacta's "wrangle once, use anywhere" approach, coupled with its extensive partnerships, integrations, and cloud compatibility, allows the enterprise maximum flexibility and choice in its IT deployments and broad support for a variety of end-user data consumption models.”

Content Continues Below


“Unifi takes a single-platform approach to address all self-service needs – including data prep – leading up to the visualization and analysis process, providing a single collaborative environment that is bound by consistent application of governance and policies,” Bartley says. “They offer in one platform an integrated set of capabilities that span four core "pillars" of self-service functionality: governance and security, catalog and discovery, data preparation, and workflow and scheduling. Data cataloging and artificial intelligence capabilities, in particular, are strongly integrated and natively developed as part of the platform.”


“Market challengers are grouped not far behind the market leaders; as all are mature companies, their ‘challengers’ categorization is overall typically defined more by their lower technology features scores than their execution scores,” Bartley says. “Their average scores do not tell the entire tale; in several cases, these vendors were brought down in the rankings by a few categories in which they scored disproportionately low on rather than consistently lower scores.”


“Alteryx offers Alteryx Designer, which provides a bench of more than 250 data tools for preparing and analyzing data from over 80 sources,” Bartley notes. “It has the capability to integrate with additional Alteryx products that provide capabilities around data management (discovery and cataloging), collaboration, and analytics model deployment and model management. Dual options for product use – code-free and code-friendly – ensure that all self-service enterprise users can prep and analyze data equally in the same environment.”

Content Continues Below


“Datawatch, with its Monarch product, provides purpose-built data prep,” Bartley says. “It excels at handling semistructured and unstructured data types, easily extracting alpha, numerical, and date data from documents such as PDFs and incorporating them into the data prep workflow. Usability and breadth of core data prep functionality are two key selling points. By catering to nontechnical business users and power users alike, the product reaches an extended enterprise audience. It scores above average on the data manipulation and collaboration and machine learning technology categories, but suffers disproportionately in the areas of integration and exploration and administration.”


“Oracle is unique in this Ovum Decision Matrix as the only vendor evaluated that tightly embeds its data prep functionality in a dedicated BI, analytics, and visualization platform: Oracle Analytics Cloud, Bartley says. “Embedding data prep in the visualization environment makes it easy for users to seamlessly transition from prep to analysis, lending Oracle above-average scores in several of the technology categories. However, this architecture does not play well with other analytics environments, leading to a disproportionately low score on the data output and analytics technology assessment category, which rewards an agnostic approach.”


“As the self-service data prep market is largely clustered together by scores on data prep technology and execution measures, the determination of follower was made largely by separation from this primary pack, defined by significantly lower technology and execution scores,” Bartley explains. “Market followers tend to be either recent entrants to the market or those that are aimed at meeting the needs of a particular kind of customer.”

Content Continues Below


“As the only vendor in this assessment that embeds its self-service data prep module as part of a data science and machine learning platform – IBM Watson Studio – it is uniquely reaching an audience of forward-leaning organizations that are specifically trying to operationalize machine learning,” Bartley says. “While most data prep vendors in this assessment are designed with the ultimate end goal of feeding prepped data into self-service visualization and analysis tools, the IBM selfservice data prep capabilities are designed to feed directly into machine learning and deep learning models. This gives the IBM product a unique, and noteworthy, approach to consider as a short-list contender for any organization that is looking to scale up data science efforts.”