Big data fabric is an emerging platform which accelerates business insights “by automating ingestion, curation, discovery, preparation and integration from data silos,” according to Forrester, in its new report “The Forrester Wave: Big Data Fabric, Q2 2018.” The research firm explains that big data fabric can support many types of use cases, including real-time insights, machine learning, streaming analytics and advanced analytics. “It offers data professionals the ability to orchestrate data flow and curate data across various big data platforms (such as data lakes, Hadoop, and NoSQL) to support a single version of the truth, customer personalization and advanced big data analytics — with zero or minimal coding.”
The leaders identified by Forrester Research support a broader set of use cases, enhanced AI and machine learning capabilities, and offer good scalability features.
“Denodo offers a credible big data platform that helps users build an enterprise-wide big data fabric quickly,” Forrester says. “Denodo's key strength lies in its unified data fabric platform that integrates all of the key data management components needed to support real-time and dynamic use cases, such as real-time analytics, fraud detection, portfolio management, healthcare analytics and IoT analytics.”
“IBM's key strengths lie in its connectivity to legacy platforms, good security frameworks, AI and machine learning capabilities across various data tiers, data governance, granular security and performance,” Forrester says. “In addition, IBM Global Business Services has been the key component for large and complex big data fabric deployments, especially ones that require customization and supporting extreme scale.”
“Its key strengths lie in its security and governance capabilities, highly scalable data movement and transformations that can be done in real-time streaming environments,” Forrester says. “Its customers use big data fabric to support various use cases, including real-time analytics across disparate data sources (such as data lakes), customer intelligence, IoT applications, and other big data applications and insights.”
“The Paxata platform provides an interactive enterprise data fabric solution that comprises several technologies to support real-time integration, quality, governance, collaboration and enrichment,” Forrester says. “Its AI and machine learning algorithms help business analysts easily understand, categorize, integrate and connect data more quickly.”
“Talend has several technologies that make for a fabric that supports real-time, batch, and dynamic data-driven use cases across on-premises, cloud and hybrid environments,” Forrester explains. “Talend's platform simplifies the process of working with Hadoop and Spark distributions as well as new technologies like serverless computing and containers, requiring no coding to perform various fabric-related activities.”
The strong performers identified by Forrester Research have turned up the heat on the incumbent leaders to offer more data management features and deployment options.
“Its key solution, the Anzo Smart Data Lake, allows technology management pros, analysts and business users to semantically link, analyze, and manage diverse data sets, whether on-premises or in the cloud,” Forrester says. “Unlike other evaluated vendors, Cambridge Semantics has been building its semantic layer using knowledge graph through its Anzo Graph technology.”
“It has broadened the data platform to support more data management capabilities: some that are open source, such as Apache Hive, Apache HBase, Apache HCatalog and Apache Avro; others that are closed source to support security, governance, integration, catalog, workload management and data preparation,” Forrester says.
Unlike other big data fabric vendors, Hortonworks' strategy is to build a completely open source platform, Forrester explains. “The Hortonworks Data Platform includes Apache Hadoop, MapReduce, Hive, HBase, Atlas, Ranger, Spark and other components. Hortonworks has additional services around data stewardship through its Data Steward Studio, and it plans to introduce other fabric based services.”
“The SAP platform can distribute data fabric operations such as integration, aggregation, and transformation across HANA, Hadoop, data warehouse and Spark clusters,” Forrester says. “Enterprises use SAP's big data fabric to support various use cases, including a 360-degree view of the customer, fraud detection, the IoT and real-time insights.”
“Syncsort is still expanding its AI and machine learning capabilities natively into the fabric architecture, but companies can use task extensions and custom functions to enable such capabilities,” Forrester explains. “Customers like its scale and performance, including integration of data quality, especially when dealing with petabytes of data.”
“Trifacta continues to leverage AI and machine learning algorithms to automate and simplify the processing of data and interaction with data for analysts and business users,” Forrester says. “Trifacta visually tracks and presents the lineage of data transformation steps for specific data sets and across multi-data-set-wrangling workflows.”
Contenders identified by Forrester Research are ramping up their offerings to expand the core functionality.
“In 2017, Pentaho became part of Hitachi Vantara, a company that unifies the operations of Pentaho, Hitachi Data Systems and Hitachi Insight Group,” Forrester notes. “Pentaho is known for its data integration solution and is extending the platform to support broader big data management initiatives, which include integration with Hadoop, Spark, and NoSQL to provide end-to-end data management capabilities for security, governance, integration and transformation.”
“Informatica's legacy of strong information management offerings has paved the way to extend its platform to the cloud and hybrid cloud, supporting broader big data fabric use cases such as the IoT, real-time operational intelligence, fraud detection, social networking, and a 360-degree view of the customer,” Forrester says.
“Podium Data provides a big data management platform that includes data profiling, metadata, data integration, data security, data transformation, discovery and data processing,” Forrester says. “Its data lake focuses primarily on Hadoop to support data preparation and data management capabilities for business users and data analysts to deliver actionable insights quickly.”
“TIBCO Software's recent acquisition of Cisco Systems' Data Virtualization business will help it support a comprehensive data fabric platform by leveraging various in-memory, graph, analytics, data quality and data management technologies from both vendors,” Forrester explains. “In addition, the acquisition will support new business use cases that require orchestration of silos in real time with self-service and AI and machine learning capabilities.”