Machine learning has never been more accessible than it is right now. Amazon utilizes it to uncover shopping habits and Netflix uses it to propose personalized movie selections. Many tech giants, both in the consumer and the business-facing arena, are using machine learning to build intelligent processes that can solve everything from creating more targeted search engine results to bigger challenges like climate change and cancer diagnostics.

Leaders in the cybersecurity space are utilizing machine learning in a similar fashion. While there is a lot of buzz in the marketplace about the potential of the approach to solve persistent issues like silent failure and false positives, there are many misconceptions about how the technology is being applied in the field.

Algorithms are Not Panacea

The value that machine learning can bring to the table largely depends on the data available to feed into it. Machine learning cannot create knowledge, it can only extract it. The scope and size of data are most critical for effective machine learning.

For example, solutions that only analyze file contents easily fall prey to obfuscation techniques and will miss breaches that are purely exploitation-based and do not involve malware. Similarly, solutions that only consider behaviors observed on a single host or in a single sandbox are at a disadvantage to solutions that analyze behaviors in the cloud at large scale from a vast array of deployments.

With sufficient quality data available, machine learning techniques easily outperform traditional signature-based or indicator of compromise (IoC) based approaches, which retroactively seek out the artifacts an attacker leaves during a breach. In contrast, machine learning can drive the creation of indicators of attack (IoA), which instead can identify active attacks and look for the effects of what an adversary is looking to accomplish. Attackers can easily change artifacts. They cannot easily change their objectives. The more varied input sources feed into machine learning, the more accurate IoAs can be generated.

This does not imply that one should only consider machine learning to find threats. Machine learning is an important tool in the arsenal, but not all classes of problems are solved by it. For example, on a Windows system there exist only a limited number of ways an attacker can get access to user credentials. All the routes open to an attacker can be expressed using IoAs by skilled experts without the need to sift through petabytes of data.

Speed and Scale Matter

In order to analyze, swiftly and accurately, billions of events in real-time, machine learning models require a level of computational power and scalability that cannot be accomplished using old-school on-premise architecture and conventional database methods. Cloud-based architectures can significantly augment the efficacy of machine learning. Algorithms can be infused with the collective knowledge of a crowdsourced community where threat intelligence is aggregated and updated instantly.

Identified attacks can then be turned into a new detection and learned by the algorithm, and shared with others within the cloud network to prevent the attack – sending the bad actors back to the drawing board.

In security today, one of the particularly challenging areas to address is combating malware-free intrusions. According to an industry study, over 60% of intrusions don’t actually involve any malware, but instead leverage stolen credentials and ‘living-off-the-land’ techniques like use of powershell and legitimate Windows tools. This is why using behavioral-based detection and machine learning prevention mechanisms is key for blocking such attacks, as well as new never-before seen malware, ransomware variants or sophisticated exploits.

Utilizing machine learning to solve cybersecurity problems is only one of the truly promising developments in our field over the last few years. It enables us to scale the knowledge of skilled human analysts to large data sizes and to increase the scope of analysis to levels of complexity beyond human cognition. But as long as humans are behind cyberattacks, there will be humans working to protect networks -- using machine learning as a tool to work with more data at more breadth.

If applied correctly, machine learning can dramatically augment an organization’s ability to fight off sophisticated cyber attacks while deriving more value out of security data and threat intelligence.

(About the authors: Dr. Sven Krasser currently serves as Chief Scientist at CrowdStrike where he leads the machine learning efforts utilizing CrowdStrike’s Big Data platform. He has authored numerous peer-reviewed publications and is co-inventor on more than two dozen patented network and host security technologies.Dmitri Alperovitch is the co-founder and CTO of CrowdStrike, leading its intelligence, technology and CrowdStrike labs teams. With more than a decade of experience in the field of information security, Alperovitch is an inventor of twenty six patented technologies and has conducted extensive research on various topics across the information security landscape.)