Slideshow 7 steps to success with predictive analytics and machine learning

Published
  • March 30 2017, 6:55am EDT
8 Images Total

7 steps to success with predictive analytics and machine learning

Most organizations are well positioned to explore predictive analytics and machine learning, but many struggle with how to initially launch the effort, says Mike Gualtieri, vice president and principal analyst with Forrester Research. In his latest Forrester Wave report on predictive analytics and machine learning, Gualtieri offers these tips to help organizations get a PAML effort right."

Drive data scientist productivity

“Data scientists are in demand, and it can be hard to find good ones,” notes Gualtieri. “Instead of hiring three more data scientists, perhaps an enterprise could dramatically improve the productivity of existing data scientist teams. Many of the PAML solution vendors focus on speeding up analysis by using big data platforms such as Apache Spark, automating portions of the data science life cycle, and improving usability of the data science workbench.”

Content Continues Below


Include multiple model deployment methods

“Production models shouldn’t sit statuesque on their own,” Gualtieri says. “They must be embedded in applications and business processes to provide business value. Enterprises must be able to deploy models in multiple ways, including as code embedded directly into applications, exposed as a service callable by applications, or injected into other platforms such as databases. Some of the more mature PAML vendors include or are integrated with decision management platforms that allow AD&D pros and business users to use a visual metaphor or express decision logic as a set of business rules that can also include models.”

Provide sophisticated model management

“The very nature of predictive models is that they may lose accuracy overtime,” Gualtieri explains. “More mature PAML solutions include features to monitor the ongoing efficacy of models in production by comparing model output with established key performance indicators and testing new models using a champion/challenger or A/B testing scheme.”

Allow polyglot programming

“Data scientists who are coders are increasingly using more than one programming language because of open source add-on libraries such as CRAN for R and scikit-learn for Python,” Gualtieri says. “Enterprise data scientists who still use the SAS programming language may also use R, Python, Scala, Julia, and others because of this.”

Content Continues Below


Expand to Apache Spark

“Apache Spark is an open source, primarily in-memory cluster computing platform that also includes Spark ML, a set of machine learning libraries that data scientists are increasingly interested in using,” Gualtieri says. “In addition to Spark ML, other machine libraries such as H2O.ai’s Sparkling Water and IBM’s SystemML run on Spark. Most PAML vendors have moved from a Hadoop strategy for analyzing big data to an Apache Spark strategy because of the machine learning libraries and speed of in-memory processing.”

Build the foundation for AI and invest in deep learning

“Machine learning models are a key building block of AI applications,” Gualtieri explains. “Deep learning is a branch of machine learning that data scientists use to build models based on artificial neural networks. This method is particularly good at creating models for image recognition (including facial recognition), but it is applicable to more traditional use cases as well. Vendors are incorporating numerous open source libraries, such as Caffe, MXNet, and TensorFlow, into PAML solutions, or they are creating their own deep-learning algorithms built into the platform.”

Accommodate citizen data scientists

“Many enterprises like the idea that non-data scientists in their organizations can create models without having in-depth data science knowledge,” Gualtieri says. “PAML vendors have responded by creating wizard-like tools in their PAML solutions to make it easy for these citizen data scientists to create simple models. While these tools may have some benefits for nonproduction models used for exploratory business intelligence (BI), an enterprise should not think that this will replace real data scientists, because there are too many complexities of model building, such as feature creation, model evaluation, over-fitting, and creating ensembles.”