Traditionally analysts in retail, manufacturing and many other industries use a variety of statistical methods to solve a range of problems in forecasting, data classification and pattern recognition. Some of these methods include regression analysis, logistic regression, survival and reliability analysis and Auto-Regressive Integrated Moving Average (ARIMA) modeling. However, because each of these methods uses different software algorithms with different data assumptions, forecasters must learn to use an assortment of tools to solve problems and produce answers.

Fortunately, neural networks can replace all of these methods and produce forecasts as accurate as or better than those available from other statistical methods. In fact, neural networks offer many advantages, including: improved accuracy over traditional statistical methods; a unified approach to a wide variety of predictive analytics problems; and they requires fewer statistical assumptions and can manage complex predictive analytics tasks in a more automated way, which saves time for analysts and programmers. We’ll take a look at what neural networks are, and why they’re suited for certain kinds of analytics, particularly predictive analytics.

**How Neural Networks Work**

Predictive analytics, pattern recognition and classification problems are not new. They existed years before the commercial application of neural network solutions in the 1980s. In reality, neural networks were discovered much earlier. McCulloch and Pitts wrote one of the first published works on artificial neural networks in 1943. In their paper, they describe a threshold neuron as a model for how the human brain stores and processes information. Neural networks were designed to mimic how the brain learns and analyzes information. For years following their paper, interest in the McCulloch-Pitts neural network was limited to theoretical discussions, until now.

Among many other benefits, applying these artificial neural networks to predictive analytics provides analysts with a single framework for solving so many traditional problems and, in some cases, extends the range of problems that can be solved. Once trained to learn, a neural network is much more efficient and accurate in circumstances where complex predictive analytics is required. This is because, just like our brains, neural networks are composed of a series of interconnected calculating nodes that are designed to map a set of inputs into one or more output signals. The nodes are referred to as perceptrons.

In many cases, simple neural network configurations yield the same solution as many traditional statistical applications. For example, a single-layer, feed-forward neural network with linear activation for its output perceptron is equivalent to a general linear regression fit.

Although neural network solutions for predictive analytics, pattern recognition and classification problems can be very different, they are always the result of computations that proceed from the network inputs to the network outputs. The network inputs are referred to as patterns, and outputs are referred to as classes. Frequently the flow of these computations is in one direction, from the network input patterns to its outputs. Networks with forward-only flow are referred to as feed-forward networks.

Other networks, such as recurrent neural networks, allow data and information to flow in both directions.

A neural network is defined not only by its architecture and flow, or interconnections, but also by computations used to transmit information from one node (or input) to another node. These computations are determined by network weights. The process of fitting a network to existing data to determine these weights is referred to as training the network, and the data used in this process is referred to as patterns. Individual network inputs are referred to as attributes, and outputs are referred to as classes. Many terms used to describe neural networks are synonymous with common statistical terminology.

**When Traditional Predictive Analytics Methods Don’t Work**

There are many good statistical tools for forecasting and predictive analytics. However, most require assumptions about the relationship between the variables being forecasted and the variables used to produce the forecast, as well as the distribution of forecast errors.

As a result, there are certain instances where traditional statistical methods are unsuitable. For example, traditional methods for time-series forecasting do not work w

hen a time series is nonstationary, has large amounts of noise (such as a biomedical series) or is too short.

ARIMA time-series models, for example, also assume that the time series is stationary, that the errors in the forecasts follow a particular ARIMA model and that the probability distribution for the residual errors is Gaussian. If these assumptions are invalid, then ARIMA time-series forecasts can be very poor.

**Using Neural Networks for Predictive Analytics**

Neural networks, on the other hand, require few assumptions. Since neural networks can approximate highly nonlinear functions, they can be applied without an extensive analysis of underlying assumptions.

Neural networks can also adapt to changes in a nonstationary series and can produce reliable forecasts even when the series contains a good deal of noise or when only a short series is available for training. Additionally, they provide a single tool for solving many problems that are traditionally solved using a wide variety of statistical tools.

Another advantage of neural networks over ARIMA modeling is the number of observations needed to produce a reliable forecast. ARIMA models generally require 50 or more equally spaced, sequential observations in time. In many cases, neural networks can also provide adequate forecasts with fewer observations by incorporating exogenous, or external, variables in the network’s input.

For example, a company applying ARIMA time-series analysis to forecast business expenses would normally require each of its departments and each sub-group within each department to prepare its own forecast. For large corporations, this can require fitting hundreds or even thousands of ARIMA models. With a neural network approach, the department and subgroup information could be incorporated into the network as exogenous variables. Although this can significantly increase the network’s training time, the result would be a single model for predicting expenses within all departments and subdepartments.

**Neural Networks’ Role in Pattern Recognition, Classification**

Neural networks are also extensively used in statistical pattern recognition. Pattern recognition applications that use neural networks include natural language processing, speech and text recognition, face recognition, playing backgammon and classifying financial news.

The interest in pattern recognition using neural networks has stimulated the development of important variations of feed-forward networks. Two of the most popular are Self-Organizing Maps (also called Kohonen Networks) and Radial Basis Function Networks.

Classifying observations using prior concomitant information is possibly the most popular application of neural networks. Data classification problems abound in business and research. When decisions based upon data are needed, they can often be treated as a neural network data classification problem. Decisions to buy, sell, hold or do nothing with a stock are decisions involving four choices. Classifying loan applicants as good or bad credit risks, based upon their application, is a classification problem involving two choices. Neural networks are powerful tools for making decisions or choices based upon data.

These same tools are ideally suited for automatic selection or decision-making. Incoming email, for example, can be examined to separate spam from important email using a neural network trained for this task.

**Solving Data Classification Problems with Neural Networks**

There are two popular methods for solving data classification problems using multilayer feed-forward neural networks, depending upon the number of choices (classes) in the classification problem. If the classification problem involves only two choices, then it can be solved using a neural network with a single logistic output. This output estimates the probability that the input data belong to one of the two choices.

For example, a multilayer feed-forward network with a single logistic output can be used to determine whether a new customer is credit-worthy. The network’s input would consist of information on the applicant’s credit application, such as age, income, etc. If the network output probability is above some threshold value (such as 0.5 or higher) then the applicant’s credit application is approved.

This is referred to as binary classification using a multilayer feed-forward neural network. If more than two classes are involved then a different approach is needed. A popular approach is to assign logistic output perceptrons to each class in the classification problem. The network assigns each input pattern to the class associated with the output perceptron that has the highest probability for that input pattern. However, this approach produces invalid probabilities because the sum of the individual class probabilities for each input is not equal to one, which is a requirement for any valid multivariate probability distribution.

**Training Neural Networks Through Error Calculations**

Lastly, the error calculations used to train a neural network are very important. Many error calculations have been researched in an effort to find a calculation with a short training time appropriate for the network’s application. Typically, error calculations are very different, depending primarily on the network’s application.

For predictive analytics, the most popular error function is the sum-of-squared errors or one of its scaled versions. This is analogous to using the minimum least squares optimization criterion in linear regression. Like least squares, the sum-of-squared errors is calculated by looking at the squared difference between what the network predicts for each training pattern and the target value, or observed value, for that pattern.

However, because the sum-of-squared errors can inflate the influence of data outliers, the use of this calculation can cause inflated network training times and poor forecasts. In these cases, the sum-of-absolute differences, referred to as the Laplacian error, can produce better results.

As business analysts continue to search for the most accurate and efficient ways to perform forecasting and predictive analytics, neural networks provide a smart, powerful alternative to traditional statistical tools. And the more that neural networks analyze data, the smarter these networks will become, which in turn will help analysts make increasingly better business decisions.

### Register or login for access to this item and much more

All Information Management content is archived after seven days.

##### Community members receive:

- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks

Already have an account? Log In

Don't have an account? Register for Free Unlimited Access