Advances in artificial intelligence (AI) and machine learning are boosting the ability of predictive analytics to boost bottom lines. Does that mean that smart machines are about to replace humans in higher-complexity jobs? No doubt, smart machines are getting smarter. But even the smartest machines lack fundamental human characteristics absolutely critical to solving certain types of problems, whether human or machine. One of these key capabilities is curiosity — but how can we replicate that?

For the answer, we need to look at neuro-dynamic programming. It’s an analytic method for learning and anticipating how current and future actions are likely to contribute to a long-term cumulative reward. This technique is related to advanced AI reinforcement learning methods, which take inspiration from behaviorist psychology to connect future reward/penalty back to earlier steps in a decision-making process. That contrasts with traditional supervised learning, which attributes reward only to the current decision.

These advanced methods focus on repeated experimentation and prediction and ultimately these chains of actions produce much more complex decisions/strategies and outcomes.  For example, these methods are leveraged in robotics to allow learning to occur to stabilize, grasp and manipulate an object. These analytic methods mimic the way the brain learns complex task sequences through pleasurable or painful feedback signals that may occur later in time – essentially, how humans seek and achieve long-term positive results. Think about how you learned to ride a bike -- gradually mastering balance, braking, mounting and dismounting (and falling safely).

Related Coverage
> Predictive Analytics and Data Science: Same or Different?
> Demystifying and Adopting Machine Learning 

Clearly, analytics that can “think” well ahead and focus on the most favorable long-term outcomes are highly valued. That’s particularly true in the many operational decisions about customers that have long-term consequences and where loyalty is earned over repeated interactions with an organization.

High customer lifetime value and healthy, sustainable cash flow are both produced by a series of interactions: The business takes an action, the customer reacts, the business responds to the new state of the relationship with another action, the customer reacts … and so on. In this way, neuro-dynamic programming enables smart machines to think ahead -- potentially making moves early in the decision chain that may not appear optimal in the short run but lead to better decisions in the long term.

Another way to think about this concept is to consider a group of dumb software agents, similar to individual ants. The agents interact with their environment, rewarded or penalized around a small set of success criteria. Gradually sequences of successful behavior emerge as the agents begin to map out the risk and reward of various inter-related activities – many paths are explored and non-optimal ones learned and abandoned in the pursuit for the best chains of actions. Those agents with few successes receive a low “fitness” score and die out, whereas those with many successful sequences score high and are allowed to reproduce, mutate, or combine with other high-scoring agents. In this way, the overall performance of the group increases.

All the while, their environment is changing. So these agents not only act in the optimal way based on their current best “map of the world,” they also experiment to deal with these changing conditions. Using probabilities, they make slight variations and mutate around the optimal strategies. As these activities result in rewards and penalties, they learn from these experiments and adjust to a changing fitness landscape continually.

As you can see in Figure 1, at any point in the sequence, the current state of the customer relationship is the result not only of the just-taken action, but also of the string of previous actions. Just as in a chess match, where a checkmate could be rooted 10 moves back — or even in the first move — the loss of a valuable customer may have started with actions taken months ago. To be successful, a business needs to understand and track this dynamic.

Figure 1: Learning to Make Better Decisions from Long-Term Results


Figure 2 depicts how these analytics learn about long-term effects by assigning credits for successful outcomes and penalties for unsuccessful ones. Although the action immediately before the outcome may receive a larger share of the credits or penalties, reinforcement learning principles require distribution of some amount of rewards/penalties across the entire sequence of actions.

Figure 2: Predicting the Outcome of a Sequence of Actions and Reactions


During training with historical data, the model learns to associate value (total discounted rewards and penalties) with a customer state and with each of the potential actions the business can take at that particular point. After training, when presented with new data on a customer indicating a given state, the model is able to predict the long-term value of taking one action over another -- and to select the best next action proven most likely to maximize the long-term value.

To improve business actions and results at a fast pace, analytics must have a way to learn causal relationships (this change in action A causes outcome Y to change in this specific way, usually expressed in expectations because Y is uncertain) from the data.

To do this, the algorithm performs a controlled amount of deliberate experimentation. While customers in similar states with similar characteristics would normally be targeted with the same action according to deterministic rules -- creating targeting bias and with it, difficulty in identifying causal effects -- advanced reinforcement-learning algorithms assign a small fraction of similar customers to somewhat different actions. In neuro-dynamic programming, these miniature experiments are essential. That’s because they help the neural networks – models that mimic the brain function to process a large number of inputs, utilizing high-speed computers and algorithms that learn to recognize complex patterns of behavior – to understand the causal effect relationships between states and actions, on state-to-state transition probabilities, and thus on customer value. 

This is an essential component of curiosity -- trying different sequence paths to determine if a different approach, even if previously explored, now makes more sense given the changing business environment or operational realities.

This aspect of experimentation in neuro-dynamic programming is as an analytic implementation of “curiosity.” This inquisitive algorithm likes to test new actions with a component of randomness, see how the real world responds, and adjust its concept of the world accordingly. Actively collecting and analyzing data, the algorithm is written with the freedom to improve. In other words, like humans, analytics can learn more from deliberate experimentation than from just passively observing the world -- doing so in controlled and sensible ways.  This is important as we look to the future where reasonable and widespread expectations exist that machines will develop intuition and awareness as they enhance our world.

Moreover, AI can go beyond just extracting information from given, business-as-usual data. It can actively generate new information. In principle, these technologies can apply this approach even when no historical data is available at the starting point.They do so by directing the models to act and to learn on the fly from streaming production data based on expert knowledge of strategies and decision chains around customers. Analytic technologies already in the fraud and cyber areas, actively learn from their environments today to anticipate, react, and counteract criminal activities.

Neuro-dynamic programming and related methods may be used  to  improve many areas of business operations — loan originations, customer management, marketing and collections, for example — where the standard practice is primarily single-shot decision modeling and optimization techniques that pinpoint the best next action, instead of modeling sequential decisions.

Adding AI techniques, we can move beyond the immediate consequences of the next action and start reasoning about chains of actions and reactions leading to long-term results. For example, we might test these techniques to help determine which sequence of introductory rate, go-to rate, cross-sell offer and credit limit increase is optimal for maximizing a particular customer’s lifetime value, and evaluate course corrections over time and as different telemetry on the customer changes their relationship with the bank.

The sequence of actions being optimized could take place over years or over the duration of a phone call. When a valuable customer calls to close her account, what is the optimal sequence of responses from the customer service agent to provide the best chance for retaining the business? We may build an algorithm to tell us, and it might be able to improve over what a human agent would do based on just his experience and training, the demeanors of the agent and customer, etc.

Get Social
> Follow Information Management on Twitter: @InfoMgmt
> Join Information Management's LinkedIn Group

The number of potential customer states becomes enormous as you move through a sequence of actions (and despite experimentation, these include many states not yet encountered or for which little historical data exist). Neural networks can be used to sift that data and predict the value of new customer states for possible actions that look similar to states and actions encountered in the past.

As an example, such a network can learn archetypes of customers that are associated by various important characteristics allowing similar treatment, gaining more experience and perspective than any one human analyst can. Neural networks can also help in estimating the probability of a customer in one state transitioning to another state — and how business actions will affect that probability.

The combination of AI analytics and Big Data is exciting because it raises the possibility of greater and faster information extraction from large-scale data. As with any other analytic advancement, gains from Big Data AI will depend on smart analytic practices -- for example, high-quality, relevant data and expert analysts that guide model development and troubleshoot issues like data bias. Still, AI analytics is showing early potential to open the perspectives and hypotheses of human experts to new business possibilities, as well as overcome certain data limitations such as targeting bias.

All in all, Big Data AI is very good news for business. Companies should welcome these developments and move forward as opportunities arise. The organizations that will benefit the most will be those already using neural networks, self-learning models, experimental design and decision optimization. If you’re not working with these analytic techniques now, think about getting started soon to position your business for the state of the art in management of customer relationships.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access