Fraud ­– whether involving credit cards, cell phones or insurance claims ­– is a significant problem, but detecting and preventing it is difficult.

Why is it so difficult? After all, there does exist a simple, fast algorithm that correctly identifies all cases of fraud: Just label every transaction a fraud. True, you'll catch many honest transactions with this algorithm, but I didn't promise precision (catching only the fraudulent transactions), just inclusion (catching all the fraudulent transactions).

Fortunately, fraud is infrequent. Unfortunately, this low frequency makes it more difficult to detect. If one-tenth of one percent of all transactions were fraudulent, a data mining tool would be 99.9 percent accurate if it declared that no transactions were fraudulent. Now we've got the opposite problem from the "everything's a fraud" algorithm previously described.

A genuinely useful algorithm takes a lot more work. Even when fraud prevention isn't always possible, after-the-fact analysis can be very useful in helping to prevent further losses (and maybe even catching the perpetrators).

When modeling fraud, the first serious problem is differentiating between legitimate transactions and nearly identical fraudulent ones. For example, two credit card charges using the same number may appear virtually identical, but in one of the cases, the credit card is stolen.

The problem is exacerbated by the high cost of both false positives and false negatives. If I incorrectly reject a transaction, confiscate a credit card, or turn over a suspected fraud to an investigation unit (false positives) there is a high cost in customer goodwill or actual money –­ possibly even a lawsuit. Contrast this with direct marketing applications, where a false positive may merely mean mailing an offer to someone who's not really a prospect. On the other hand, I can't afford to miss many cases of fraud (false negatives), because each loss may be very costly.

The second issue is that fraud is an adaptive crime. Perpetrators learn what is and isn't working and change their behavior accordingly. The environment in which fraud occurs is also evolving: your business practices change, new people interpret actions differently, forms change, etc.

Because of the problem complexity, fraud detection is a multistep process:

  1. Identify suspicious transactions via database checks and data mining.
  2. Investigate.
  3. Act on the results.

You could, of course, skip the investigation step, and just rely on the identification. However, I would feel very uncomfortable defending a lawsuit with the statement "The neural net told me he was guilty."
Before you apply data mining, you need to explore your data and develop a set of business rules for identifying normal behavior versus suspicious behavior. With these, you can develop standard validation procedures such as checking your database to verify whether an address exists, whether a social security number is consistent with date and place of birth or when a particular property was sold and for how much.

You can't examine individual transactions exclusively; you also need to consider contextual information such as account-level data. A single call from New York to California may look just fine; however, if it is followed ten minutes later by a call on the same cell phone made from California to New York, something is definitely fishy.

Good database checks will stop a large amount of the fraud. Only when they are in place is it time to move to data mining.

Profiling requires understanding normal behavior in order to recognize abnormal behavior. How many purchases above a certain value does a credit card account typically make in a day or week or month? Are charges almost always made from within certain postal codes? Data mining contributes heavily by helping to build profiles that accurately characterize account behavior. Departures from these norms may indicate fraud. One departure may not mean much, but two or three such instances ought to attract your attention.

Sequence analysis can also help uncover some behaviors that are characteristic of fraud. For example, a small gasoline purchase followed by an expensive purchase may indicate that someone was verifying that a charge card still worked before going after serious money. Other examples would be frequent combinations of the same doctors and lawyers in conjunction with certain types of insurance claims, or multiple sales of a real estate property in a short span of time with the value of each sale rapidly ratcheting up.

No single test is reliable enough to catch every fraud and only frauds. You'll need to combine the results from individual flags to create a score which raises a red flag when it exceeds a threshold. The behaviors you incorporate, the weights you give to atypical behaviors, and the level at which a warning flag is raised will constantly change, reflecting the changes in your business and the perpetrators' actions.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access