January reader favorite: The Intel chip vulnerability: Understanding Meltdown and Spectre
There’s a tempest in progress – and, no, I’m not talking about the “bomb cyclone” currently hitting the US eastern seaboard. Instead, I’m referring to what’s going on in the technology and security communities in the wake of the newly published Meltdown and Spectre issues.
Understanding what these issues are is important for practitioners, regardless of whether you are a security, governance, risk or assurance professional: not only do these issues require action to address, but there’s also a significant amount of coverage in the mainstream and trade press. This in turn means interest from board members, senior leadership, and concern from other teams in our organizations. Having a solid understanding about what the issues are (and a realistic understanding of the impacts, future ramifications of the issues, etc.,) is therefore both valuable and necessary.
With that in mind, let’s unpack these two issues: what they are, why they matter, and what organizations can do about them. In the days ahead, we’ll discuss more about lessons learned and how organizations can use the lessons from these issues as part of their broader planning and how to communicate up the chain about risk impacts, but first let’s unpack the issues themselves and why there’s so much hubbub about them.
What are the issues?
Meltdown and Spectre are similar in that they both relate to speculative execution side-channel attacks – meaning, they both exploit the “speculative execution” feature in modern hardware design to enable side-channel attacks. Attacks against what? In the case of Meltdown, the isolation between user mode applications and the operating system (kernel space). In the case of Spectre, the segmentation between different applications. Key to understanding these attacks are two separate (but related) concepts: speculative execution and side-channel attacks. Let’s walk through them to explain the mechanics of their operation.
First, speculative execution. We all know that in every computing environment since Turing, computing devices execute a series of instructions in sequential (linear) fashion. A computer program instructs the processor to perform operation A, followed by operation B, and then operation C. The order matters, and the order of execution is dictated by the program that is being run. Think about it like a recipe for baking bread: first you proof the yeast, then you add flour, then add eggs, then add water, and so on. In this analogy, the recipe being followed is akin to the program being executed while the person doing the cooking is a computer’s processor.
As anybody who has ever followed a recipe knows, though, sometimes individual steps can take a while to complete – proofing the yeast or preheating the oven, for example. Someone following a recipe that includes those steps could choose to sit idly by while those things complete (i.e., while the oven preheats or the yeast proofs), but doing so would make them a pretty inefficient cook. More reasonably, they might choose to progress to other tasks while steps that take longer happen in the background.
For example, they might flour a kneading surface, prep other ingredients they’ll need, or get baking pans ready while the oven is preheating. At a macro level, the order is the same as dictated by the recipe (i.e., preheating happens first before you put the bread into the oven), but at a micro level, steps are adjusted in their order to optimize efficiency. Believe it or not, computers do exactly this, too.
At a hardware level, a program might dictate that A, B, and C are executed in sequential order. Just like preheating the oven, though, individual steps might take longer to complete. For example, operation B might require accessing memory, which takes longer than instructions A and C, which can be done entirely within the bounds of the processor, using only the processor’s registers. To optimize performance, just like the cook moves on to other things, so, too, does the computing hardware. It might perform task C ahead of time while it waits for task B to complete.
Speculative execution builds on this relatively simple concept but takes it one step further. In this case, it relates to the pre-execution of certain tasks dependent on the outcomes of other tasks. In essence, the processor makes a “guess” about what path will be followed based on what it has seen happen in the past. If the outcome is different, the results of any pre-execution are rolled back, and no time is lost (timing-wise, it’s the same as if the processor were idle), but if things go the way as expected, significant time savings are realized.
The second thing to understand is that these are side-channel attacks; meaning, they leverage the physical, electrical, or mechanical characteristics of the implementation (i.e. “side-channel information”) to operate. As noted above, in the case where speculative execution occurs and the processor makes a “guess” that turns out to be wrong, what’s supposed to happen is that the processor disregards the outcome of the pre-execution and unwinds to a state from before that speculative execution. And, in fact, that’s what happens.
However, in some cases, state changes can occur in a way that is measurable by an attacker because of the act of doing the speculative execution. Consider, for example, an attacker wishing to know whether a given value exists or whether it doesn’t in memory. In a situation where attackers can cause that value to be cached, they might be able to measure the time required to perform an access and deduce whether the value is present based on the amount of time to complete (i.e., it would take longer to complete when it is not cached versus when it is). This is an example of a side-channel attack.
The salient point is that these attacks leverage the speculative execution functionality to learn information from across boundaries: from across the user mode/kernel mode boundary in the case of Meltdown and from across application boundaries in the case of Spectre.
What can you do?
Understanding conceptually what the issues are, the next logical question is what can be done about them. And, unfortunately, the answer to this will be different from organization to organization. That said, there are a few important things to keep in mind as we all deal with the fallout from the issue.
First, it bears outlining that these are hardware rather than software issues. Because the speculative execution feature is implemented in the firmware and hardware on which operating systems and applications run, it impacts a wide swath of any given technology ecosystem. This means that chances are pretty good that your environment is impacted … and heavily.
This is true regardless of whether you’re talking about technology that’s on-premise or in the cloud, regardless of the OS in use, and even in devices that you might not expect (think IoT). In fact, the scale of impact is exactly why these issues are getting the amount of press coverage and executive attention that the issues are getting.
Second, it is important to note that there are patches available for many platforms and more on the way. It is important that you apply them as they become available and that you keep on top of the vendor information and workarounds for the platforms you use. Again, this is a hardware issue at the root of it all, so understand that current software workarounds can only go so far (in large part by disabling some of the performance optimizations that are part of the underlying issue).
Getting to full remediation is going to be a long road to travel. The performance optimizations from speculative execution have been around for decades and these issues are only now coming to light. And, ultimately, it’ll take hardware design changes to fully address the issues. This will take time, patience, and clearheaded thinking to address over the long haul – meaning that panicking about it now isn’t useful.
Lastly, keep in mind that these are: a) Information disclosure issues that are b) Not actively being observed in the wild. These could both change as further research occurs or as bad guys get creative in leveraging the issues to conduct attacks. However, for now, it’s about getting access to data versus actively gaining unauthorized access to systems via this issue. The point is that keeping an objective, risk-informed posture (including a clear head and staying free from panic) is important.
(This post originally appeared on the ISACA blog, which can be viewed here).