My mailbox and newsfeeds are littered lately with references to "big data," that eye-catching phrase that implies subtly something powerful like "el niño" or "controlled fusion."
And though there's a lot of hype around big data -- I'm sure someone is printing "We Do Big Data" T-shirts as we speak -- there is a new frontier and reality to the term.
Last week on DM Radio, Eric Kavanagh and I were talking to guests that included Jim Kobielus from Forrester Research and Neil McGovern from Sybase. (If you haven't checked out these shows, you should, they're spontaneous and fun and a lot of what gets said there never gets into print.)
But anyway, I mentioned my observation about "big data" as the hype cycle de jour while knowing anecdotally what was going on with ultra-large-scale data processing and throughput that's been abetted by cheap computing and storage, MPP, in-memory databases etc.
McGovern was talking about complex event processing or CEP (another pregnant term) and the idea of bringing the data to the query, rather than the other way around, which is something we're seeing more of now with in-memory analytic processing and the advent of the petabyte-scale data warehouse.
In truth, all sorts of pattern matching is coming online with clickstream data feeds and/or high-speed computer trading. Think financial services or telco, where a single business will have 50 million call records per day.
Or look at China Mobile and its half billion users, McGovern says, and what's generated out of one call per customer per day. "You'd be monitoring in real time with very low latency because you don't structure the data and put it on disk, it's all handled in memory. That allows you to set up pattern detection so you can look for patterns in the data coming in."
That's what CEP tends to be about, a kind of forensic analysis (like a series of corpses to pick over moving by on a conveyer belt) of something that's actually just an invisible blur. McGovern says "one of the faster feeds" he knows about in financial services uses an engine that monitors one million transactions per second.
Ahem. I stopped to confirm he'd said "one million transactions per second."
If you're into big data, you might already know this. But if you're not heaving data at high velocity, don't fret. The good news is that the rest of us have useful resources of data and processing coming online that can provide all sorts of value without turning into something you need to shield your eyes from.
McGovern says the three variables in this kind of work are the volume of data, the velocity of the data and the complexity of analysis, meaning how much time you have to respond to make this sort of analysis useful.
Those guidelines apply just as well to conventional (but souped-up) data streams you can use to help run your business. Part of my job, for example, could be called Web manager, though it's a role that's really tended to by committee, meaning we're probably not taking it seriously enough.
If I stop and look at hourly accounting of what's happening for our business in terms of reader behavior and interest, I can revise our content accordingly, tweak headlines, drop and elevate stories. I don't do it as much as I could, but I am paying closer attention now to cause and effect, just like the big boys at NBC's online properties where I used to work.
Maybe that is the point and the reason it's increasingly worth your time to learn about what's happening in the realm of rapidly processed data streams. You might find the same opportunity in free Google analytics or some in-memory view of customer interactions that someone else needs to build a nitrogen cooled data center to support.
It doesn't have to be on some monstrous scale to deserve your attention.
It just has to be useful.