Big data has a big problem: How to support real-time analytics
I’m stating the obvious when I say big data, business analytics platforms and the IoT have IT and business development teams scrambling to see what benefits stand to be gained by analyzing the massive amounts of data their systems generate. But with organizations generating terabytes of data every day, the time required to upload, categorize, clean, and analyze that data to gain any insight from it can be significant.
Most organizations implement batch data processing schedules in which new data from client devices is uploaded to the datacenter for analysis at scheduled intervals, not continuously. This batch data processing method can take anywhere from a few hours to a few days to complete. But if the IoT and big data are going to revolutionize the world, it’s time for the big data industry to figure out a way to address the need for real-time data analysis.
While batch processing is sufficient for many applications that don’t require real-time performance adjustments, there are vertical markets (healthcare, for example) where decisions need to be made and implemented in seconds to avoid potentially catastrophic problems.
Additionally, after the data is analyzed and the analytics platform recommends a change to a client device or process, if that change requires a human to implement it, the time between spotting a problem and fixing it grows even longer. And in a real-time environment, I’m not exaggerating when I say that delay can be the difference between life and death.
The need to deliver real time analytics becomes more challenging when the connected nature of the IoT is added to the mix. Specifically, endpoint devices in an IoT network are not always guaranteed to be connected to the Internet.
For example, what if the user of an IoT-connected medical device is travelling and can’t get the device online due to poor signal quality or technical issues? The device itself would need to be able to interpret data and extrapolate the appropriate action based on that data. Automating that process is the only way to deliver real time performance until the device is able connect to the Internet and call home for instructions or updates. So what’s an organization looking for real-time analytics to do?
Conduct analytics at the endpoint. With data templates and analytics algorithms preloaded on endpoint device, data can be formatted locally on the device to be acted upon at the endpoint if a connection to the datacenter can’t be made.
I want to be clear here that I’m not talking about a list of pre-defined list of rules that instructs an endpoint to take a specific action when certain parameters are met. Trying to anticipate any potential problem and developing a rule to handle it locally requires too much time and effort. The more effective approach is for the endpoint device to use its own processing resources to power analytics at the local level and machine learning to identify the proper course of action.
Some will argue that the actual endpoint devices aren’t guaranteed to have the memory and processing resources needed to conduct the required analytics. True, but we know the device has some sort of connectivity solution in place (Wi-Fi, LTE, Bluetooth, etc.), so why can’t that connection be used to leverage the capabilities of another nearby device?
For example, Microsoft Research recently demonstrated an IoT-enabled wearable that can help Parkinson’s disease patients control their tremors. While the specific capabilities of Microsoft’s Emma device weren’t provided, it’s reasonable to assume that an OEM mass producing such a product would want to keep the price competitive to foster patient adoption.
It’s also reasonable to assume patients also carry a smartphone, and that smartphone has enough processing horsepower to conduct real-time analytics and then push any new instructions to the Emma device if necessary. Accordingly, the device OEM can keep costs down by not requiring significant processing or memory resources in the wearable.
Big data will indeed revolutionize many applications, but those requiring real-time performance will some additional data preparation and analysis at the local level if they’re going to deliver the experiences customers expect.