for Information Management Blogs
AUG 3, 2010 11:14pm ET

Blogroll

(Really) Big Data

Print
Reprints
Email

My mailbox and newsfeeds are littered lately with references to "big data," that eye-catching phrase that implies subtly something powerful like "el niño" or "controlled fusion."

And though there's a lot of hype around big data -- I'm sure someone is printing "We Do Big Data" T-shirts as we speak -- there is a new frontier and reality to the term.

Last week on DM Radio, Eric Kavanagh and I were talking to guests that included Jim Kobielus from Forrester Research and Neil McGovern from Sybase. (If you haven't checked out these shows, you should, they're spontaneous and fun and a lot of what gets said there never gets into print.)

But anyway, I mentioned my observation about "big data" as the hype cycle de jour while knowing anecdotally what was going on with ultra-large-scale data processing and throughput that's been abetted by cheap computing and storage, MPP, in-memory databases etc.

McGovern was talking about complex event processing or CEP (another pregnant term) and the idea of bringing the data to the query, rather than the other way around, which is something we're seeing more of now with in-memory analytic processing and the advent of the petabyte-scale data warehouse.

In truth, all sorts of pattern matching is coming online with clickstream data feeds and/or high-speed computer trading. Think financial services or telco, where a single business will have 50 million call records per day.

Or look at China Mobile and its half billion users, McGovern says, and what's generated out of one call per customer per day. "You'd be monitoring in real time with very low latency because you don't structure the data and put it on disk, it's all handled in memory. That allows you to set up pattern detection so you can look for patterns in the data coming in."

That's what CEP tends to be about, a kind of forensic analysis (like a series of corpses to pick over moving by on a conveyer belt) of something that's actually just an invisible blur. McGovern says "one of the faster feeds" he knows about in financial services uses an engine that monitors one million transactions per second.

Ahem. I stopped to confirm he'd said "one million transactions per second."

If you're into big data, you might already know this. But if you're not heaving data at high velocity, don't fret. The good news is that the rest of us have useful resources of data and processing coming online that can provide all sorts of value without turning into something you need to shield your eyes from. 

McGovern says the three variables in this kind of work are the volume of data, the velocity of the data and the complexity of analysis, meaning how much time you have to respond to make this sort of analysis useful.

Those guidelines apply just as well to conventional (but souped-up) data streams you can use to help run your business. Part of my job, for example, could be called Web manager, though it's a role that's really tended to by committee, meaning we're probably not taking it seriously enough.

If I stop and look at hourly accounting of what's happening for our business in terms of reader behavior and interest, I can revise our content accordingly, tweak headlines, drop and elevate stories. I don't do it as much as I could, but I am paying closer attention now to cause and effect, just like the big boys at NBC's online properties where I used to work.

Maybe that is the point and the reason it's increasingly worth your time to learn about what's happening in the realm of rapidly processed data streams. You might find the same opportunity in free Google analytics or some in-memory view of customer interactions that someone else needs to build a nitrogen cooled data center to support.

It doesn't have to be on some monstrous scale to deserve your attention.

It just has to be useful.

Advertisement

Comments (2)
Whether companies are looking at 100TB of data to identify dropped calls and cell phone tower placement or whether online gaming companies are querying hundreds of TB of data to drive additional value for gamers, it's an exciting time to help companies derive maximum value from all this information. The merging of increasing, new data channels from online transactions and behaviors and increasing computing power has an inevitable impact on business analytics. It's exciting to hear how companies are innovating and how they are monetizing that data for positive business impact. To your point, Jim, data doesn't need to be on a monstrous scale - the speed at which insight can be garnered for 2TB can have just as much significant impact as for larger scales.
Posted by Jennifer S | Wednesday, August 04 2010 at 3:01PM ET
Is "big data" really just petabytes? A few years ago it was terabytes... It's not the volume or the velocity that makes something "big data". It is big data when I can extract value from ALL of it. Ideally, of course, I'd like to extract every last penny of value out of my data but even for us mere mortals, I'd say big data is when I'm doing something useful with all of my data. Just sampling and performing analysis on those samples is definitely not big data. Sipping from the firehose isn't impressive. Big data is when I can scale to process the volume, keep up with the firehose, and afford to crunch through all of that data that I can get my hands on. Anything less is just a big number.
Posted by Davin P | Friday, August 06 2010 at 12:40AM ET
Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.

Blog Archive for Jim Ericson

Next Stop’s Mine
Data Services Verticalization in 2013
Mobile is BI’s Big Stick
Seriously, What is PaaS?
Cooks, Chefs and IT

More from Jim Ericson »

Blog Index »

Where do young IT professionals (30 and under) obtain information to aid with daily role responsibilities and career development?

Trade publication websites 14%
Social media 23%
Vendor websites 4%
Vendor/community forums 7%
Newsletters 1%
Trade conferences/meetups 2%
RSS feeds 6%
Web search 44%

 

Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.