Profiling the Network: Using Analytics to Know Who's Really Who

Register now

Cyber-attacks show no signs of slowing down, and organizations continue to look at any and all approaches to improving security. Securing the network is fundamental to protecting the business, and a variety of tools exist to understand traffic flow over a network and to analyze security impacts from that flow.

Despite the capabilitis of these tools, however, attacks and breaches continue to happen on a weekly basis. It is time to expand the definition of network profiling to include the riskiest asset on the network: the user.

A user-centric view of the network helps answer these key questions:

Who is on the network?

The answer might not be as easy as it seems. Users have composite identities made up of information from multiple accounts, applications, and repositories.

In a large company, the average employee’s identity might include a Windows ID (stored in Active Directory), as well as different accounts for apps such as ADP (for payroll), (for CRM), Concur (for travel), Oracle (for accounting), etc. The user might have a different ID for his iPad, which he’s brought from home, under the firms’ BYOD policy.

It’s the rare organization that tracks all of these official identities in one location. Even worse, the user might have unofficial identities that aren’t tracked anywhere.

For example: shared admin accounts, Unix IDs that are unrelated to the user’s Windows account, etc. If throughout the workday, an employee uses all of these, across his work and personal devices how will the firm connect these; and if they can’t how will they know who, exactly is on their network?

What are they accessing?

Again, this seems like a straightforward request. Users access servers and applications over the network. But just as most firms have only a limited understanding of the users on the network, they also have limited understanding of the assets on the network.

In 25 years working with enterprise technologies, I’ve met perhaps a handful of firms that have a central, well-maintained CMDB. Most firms use a variety of systems to track assets, and even these only have limited information.

For example, IT might know that a user is accessing server svr_2032, and perhaps might know which human is tied to the ID used for access. But it’s unlikely that the firm knows that the server is often used by the CFO, and that it has financial data stored there.

Are they acting normally?

In the rare instances where a firm knows exactly who’s on the network and what they are accessing, it’s even harder to answer the question of normal behavior: is this person supposed to be doing what they are doing?

This is a question that is very hard to answer from a network profiling perspective. Flow data might give some insight into traffic, but it doesn’t connect to identity and it doesn’t provide context around behavior.

Put together, network profiling benefits from the ability to answer, in detail, who is on the network, what are they accessing, are they supposed to be doing so, and most importantly then, what does this imply for risk? In theory, it should be easy to answer these questions, but in practice, it has been, historically, extremely difficult. Today, this is much easier to answer; all of the puzzle pieces now exist.

Advances in data science, combined with computing power and applied to data already collected within most organizations, can connect the dots and provide a useful profile of network user activity.

While data science -- i.e. machine learning -- has become an overused buzzword, in practice it can provide very useful answers in certain applications. For example, machine learning can discover the connections between seemingly unrelated bits of identities, to create a map of all of a user’s activities, even when the identity components are not explicitly linked.

As an example, Fred logs into his Windows machine on Monday morning, receives an IP address, and later performs a remote login to a Linux box using an unrelated admin account. Previously, Fred’s Windows ID and the Linux account ID might never get connected, and network activities from both machines would appear unrelated. Currently machine learning engines can connect them automatically and provide tracking and a broader view of Fred’s true activity on the network.

Other techniques can create baselines of normal behavior for every user on the network, making it easier to understand whether each user is acting normally or not. Still other techniques can build better asset models, including which machines are likely “executive assets” and at higher risk of attack.

In theory, any and all of these could have been done with previous technologies. In practice, limits in computing power and machine learning techniques prevented this level of awareness.

The good news is that today it is quite possible to understand in great depth and with deep context exactly who is on the network; what they are doing; whether they should be doing it; and what it means to an organization’s risk and security posture.

The data is already being generated and collected, and the contextual bits exist. The tools and techniques to tie everything together have come to market and are providing value today. As a result, the definition of network profiling must change, and the results can be quite positive.

For reprint and licensing requests for this article, click here.