You are a security analyst, sitting in the SOC, and you receive an alert that the user on machine 65.43.55.01 is accessing the customer database and initiating a backup. Should you worry?

It seems like an easy question to solve; either this user is supposed to be taking backups of the customer database and all is well, or else we have a security problem. Unfortunately, in many instances today, it’s quite difficult to answer the simple question: is this normal behavior, or not?

While no security professional secretly pines for the days of viruses and SQL injections, there was a certain simplicity to cyber-attacks a decade ago. That is, it was usually easy to see that a particular action was unwanted and unpleasant. Attacks were transactional: a bad guy enters a certain SQL string into your website and immediately receives thousands of customer records.

Today’s threats are much harder to evaluate. They span multiple machines, may run for weeks or months, and may cross multiple accounts and identities. “Is this normal?” is harder to evaluate when looking at a single log event in a chain of events spanning weeks.

A customer described the problem well. “When we get an incident, we need to understand where this user came from, where he went after, and most importantly, was this normal behavior or not?” To understand the normal, we first must connect pieces of information that are rarely in one place. Incident responders typically begin by pulling domain controller records to understand who had a particular IP address at the time of the incident.

They might next start searching logs to assemble before and after for that IP. Harder still is to connect movement from the user’s workstation to a remote server, using an unrelated account. After a great deal of manual effort, the IR analyst might have a coherent timeline of events leading up to and away from an incident. Unfortunately, that timeline is unlikely to indicate whether the event chain is normal for this employee.

Piecing the timeline together is an important step. Equally important is to understand if each activity within the chain is normal – for that user. If I have a database administrator accessing and backing up my customer database, that might be normal and acceptable. If that same database is backed up by a call center rep, the situation is quite different. To determine “normal,” our IR analyst may need to run many more searches and queries on historical data, pump the results into a reporting system, and then analyze trends to better understand the potential risk.

This assumes that the organization has actually retained enough data to perform this level of analysis. In many firms, the volume of data is such that only 30 days’ worth is kept at any time. If the firm retains more, the data scale may overwhelm the reporting system, and the human analyst might simply miss important trends in the sea of events she’s trying to analyze. In short, the inability of most SOCs and IR teams to process enough data to understand every user’s normal behavior creates a gaping security hole. Most organizations can’t detect complex threats because they can’t put user behavior in context.

“Understanding the normal” is an area where advances in machine learning can be extremely helpful. Algorithms can help produce context from a corporate network: is this a server or a workstation? Is this a human or a service account? Is this an admin or a normal user? Is the user of account A the same user for Account B today? Machine learning can connect events into coherent sessions, and the combination of algorithms and statistical analysis can produce very useful baselines of normal behavior.

With a picture of normal behavior, it becomes much easier to evaluate activities as anomalous. Even better, machine scoring can weigh those anomalies to reduce noise and operator alert fatigue.

My next piece will go into the applications of machine learning in greater detail, but for now, let me summarize. Understanding each user’s normal behavior is challenging in the current IT environment. It’s hard to fit the pieces together and to track the whole over time. Machine learning is providing useful solutions in this regard, and the results can make the SOC analyst’s job significantly easier.

(About the author: Nir Polak is CEO and cofounder of Exabeam)

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access