14 leading tools for predictive analytics and machine learning

Register now

Forrester Research has identified 14 leading tools for predictive analytics and machine learning. Vendors included in the research firm's assessment included: Alpine Data, Angoss, Dataiku, Domino Data Lab, FICO, H2O.ai, IBM, KNIME, Microsoft, RapidMiner, Salford Systems, SAP, SAS, and Statistica.

Each of these vendors has a comprehensive, differentiated PAML solution; a standalone PAML solution; install base and revenue history; and motivated client inquiries.

The leaders

Seven predictive analytics and machine learning products were selected as keaders among the 14 evaluated.

Angoss KnowledgeSTUDIO
“Angoss KnowledgeSEEKER is a must-have for data science teams that wish to use beautiful and comprehensive visual tools to build decision and strategy trees,” according to Forrester. “It also offers KnowledgeSTUDIO for building models, InsightOPTIMIZER for numerical optimization, and KnowledgeMANAGER for model management. Angoss recently added a coding environment that allows data scientists to use programming languages including R, Python, and the language of SAS. It also has some integration with Hadoop and Apache Spark.”

FICO Decision Management Suite
“FICO’s extensive real-world experience has led to a solution that focuses on the needs of the chief data scientists as well as the rank-and-file data scientists in a large organization,” Forrester says. “Chief data scientists at mature enterprises demand three key things from data scientists: 1) explainable models, 2) accurate accurate models, and 3) decision management. FICO’s Decision Management Suite encompasses the end-to-end capabilities needed to create, deploy, and monitor models for use in complex, consequential enterprise decisions. FICO needs to expand the number of algorithms it supports to compete more broadly.”

IBM InfoSphere and SPSS Modeler
“SPSS is still the core of IBM’s data science platform, but IBM is launching projects such as SystemML from its investments in its Spark Technology Center,” Forrester says. “IBM has also introduced the Data Science Experience for data science coders, which provides a quick cloud provisioning of open source Jupyter and/or RStudio notebooks with a Spark cluster on the back end to run data pipelines and train models. SSPS is a good fit for data scientists who want the productivity afforded by methods encapsulated in operators.”

KNIME Analytics Platform
“KNIME is not a big company, but it has a big community of contributors who continually push the platform forward with capabilities such as bioinformatics and image processing,” according to Forrester. “The KNIME Analytics Platform is free to download and use and includes over 1,000 analytical and model building operators. The vendor funds its ongoing operations by offering commercial extensions such as the KNIME Server for sharing workflows, advanced security, and remote execution of model building workflows. Maybe KNIME was smart for not taking gobs of venture funding during the big data rush.”

RapidMiner Platform
“RapidMiner invested heavily to revamp its visual interface, making it the most concise and fluid that we have seen in this evaluation,” Forrester explains. “It also has a comprehensive set of operators that encapsulate a wide range of data prep, analytical, and modeling functionality to increase productivity of data scientists. RapidMiner is open source and has a community that contributes to its growing list of operators. RapidMiner Studio is free to download and use for up to 10,000 data rows, with tiered pricing for more than 10,000 rows of data.”

SAP BusinessObjects Predictive Analytics, SAP HANA SPS
“SAP offers comprehensive data science tools to build models, but it is also the biggest enterprise application company on the planet,” Forrester notes. “This puts SAP in a unique position to create tools that allow business users with no data science knowledge to use data-scientist-created models in applications. SAP’s solution offers the data tools that enterprise data scientists expect, but it also offers distinguished automation tools to train models. The solution has plenty of room to grow into its existing applications customer base, but its dependence on SAP’s HANA data platform will limit its attractiveness to non-SAP customers.”

SAS Analytics Suite
“SAS is unifying its comprehensive portfolio of data science solutions under SAS Visual Suite,” Forrester notes. “It brings together world-class data prep, visualization, data analysis, model building, and model deployment. This unified tooling approach provides a consistent user experience that data scientists need to build even the most sophisticated models. SAS’s vision for data science is not limited to innovation in tools. It has been quick to jump on new, promising analytical methods across multiple disciplines, such as statistics, econometrics, optimization, machine learning, deep learning, and natural language interaction.”

Strong performers

Five predictive analytics and machine learning products were determined to be strong performers from the 14 products evaluated.

Alpine Data Chorus
“Data scientists spend an inordinate amount of time preparing data and conversing with business stakeholders compared with the time they spend on building valuable models,” Forrester says. “Alpine Data’s visual tool provides data engineers, data scientists, and business stakeholders with the capabilities they need to divide and conquer the work of building models. Data engineers can use the tool to prep data. Data scientists and business stakeholders can communicate using built-in collaboration features.”

Dataiku DSS
“A haiku is a Japanese form of poetry of 17 syllables — concise and evocative if done well,” Forrester explains. “That’s Dataiku’s guiding inspiration — to offer a data science platform that lets coders use a notebook when they must, but use visual tools to build workflows when productivity is at a premium. Dataiku is one of the new venture-funded startups that aim to be a well-rounded alternative to the long-time market competitors by offering a notebook experience embedded in a visual experience. With better model management capabilities, Dataiku is poised to challenge the leaders posthaste.”

H2O.ai H2O
“H2O.ai is best known for developing open source, cluster-distributed machine learning algorithms at a time (2011) when big data demanded them but no one else had them,” Forrester explains. “To say H2O.ai is an algorithm company today is an understatement. It also offers Sparkling Water to create, manage, and run workflows on Apache Spark and Steam to deploy models. Further, it offers Flow — a notebook-like experience similar to Jupyter. The company recently announced Deep Water and amalgamated distribution of open source deep-learning libraries Caffe, MXNet, and TensorFlow. “

Microsoft Cortana Intelligence Suite, Microsoft R Server
“Microsoft offers Microsoft R for data scientists who wish to code in the R programming language supported by callable cluster-distributed algorithms,” Forrester explains. “It also offers Azure Machine Learning to data scientists who want a more traditional visual development tool. There is no reason why enterprises cannot enjoy both. Microsoft’s machine learning cloud services offer pretrained models for tasks such as image labeling, voice recognition, and natural language processing that allow developers with no data science knowledge to use them in applications.”

“Statistica was founded in 1984 as Statsoft and acquired by Dell in 2014 as part of Dell’s focus on building an enterprise software portfolio,” Forrester says. “It is now part of the newly relaunched Quest Software. The Statistica solution is based on a data science workbench that has a rich set of algorithms and data prep tools that are especially relevant for manufacturers and scientific use cases. Statistica was a non-participating vendor in this evaluation.”

The contenders

Two of the evaluated predictive analytics and machine learning products were viewed as contenders.

Domino Data Lab Domino
“Domino Data Lab’s solution aims to package the most popular open source coding tools and libraries and provide a unifying interface for teams of data science coders,” Forrester explains. “But this approach also has a drawback: Many of the features critical to enterprises, such as model management and advanced workbench tools, lack open source options. We don’t think Domino can wait for the open source community to add critical enterprise features.”

Salford Systems SPM Salford Predictive Modeler software suite
“Salford Systems is adored by its community of customers, large and small, for its implementation of specific methods including CART, MARS, Random Forests, and TreeNet,” according to Forrester. “Most other vendor solutions have one or more of these methods, but Salford claims that its methods are the best because they are implemented by their inventors — including Jerome Friedman, a professor of statistics at Stanford University. Salford provides a workbench tool for modeling and has added automation scenarios to improve data science productivity. Its focus on creating the most accurate models has been at the expense of adding features like model management and big data analysis that a broader range of enterprise customer needs.”

For reprint and licensing requests for this article, click here.