Big Data’s Three-Legged Stool
It’s an intermittently cloudy Tuesday morning in San Jose, but no matter. I’m stuck in a hotel ballroom at a big data conference and haven’t seen sunlight in 36 hours. A young woman pulls up a chair next to mine. Unfurling her iPad cover, she explains that her boss has sent her to this event to learn all she can about big data and report back.
I ask her why. She looks up from her screen, surprised by my question. “So we can start watching social media comments,” she says, as if she’s explaining why I should tie my shoelaces and get a good night’s sleep.
We chat. It turns out she works in the marketing department of a mortgage lender with a new executive team. The CEO is taking aim at the marketing vice president for being overly reliant on print ads. “They want social analytics,” my marketing friend explains. “They want to understand what people are saying about us on Twitter. I’m not sure why. I’m really here to learn about open source and Hadoop.”
This is how the conversation goes with big data. At conferences and in conference rooms, people are intrigued by the idea of mining proliferating volumes of unstructured data, suspecting that it will help their businesses. But they admit they don’t completely understand what technologies to use or where to start. Meeting this unspoken demand, big data blog posts, tweets and conference keynotes often feature only two legs of the stool: the platform and the analytics. But it shouldn’t stop there. Data governance and data management form big data’s critical third leg on which to build business value.
When people talk about big data platforms they’re typically referring to an appliance that is purpose-built for large, complex data volumes. Big data platforms often involve commodity hardware and open source software, making adoption easy and data analysis whip-fast. Incumbent technologies like data warehouses or master data management hubs aren’t optimized for this type of processing. As companies research emerging technologies what they really want to know is not only how they work, but how they fit into increasingly unwieldy technology infrastructures.
Big data analytics conversations are curiously decoupled from big data. Industry events routinely feature success stories on customer churn, network optimization, life stage marketing and sales trend analysis—important capabilities to be sure, but they are all delivered with advanced analytics tools most companies are likely to have in house today. At the conference I tweeted: “I’m in the Analytics track and so far haven’t heard anything about big data” I was e-chastised for being in the wrong track. “Come to the Platform Track!!!” tweeted a helpful delegate.
Clearly to exploit the increasingly complex formats and growing volumes of data flowing in multiple directions inside and outside of our companies we need the new crop of specialized technologies. Bona fide big data analytics — think smart meter data streaming into utility companies that allow consumers to regulate their own electricity usage in real time or consider patient vital signs from across hospitals and clinics loading throughout the day into a Hadoop cluster for outcomes monitoring and bed assignments by hospital staff — require new, high performance, purpose-built solutions.
Managing Big Data for Big Value
Acquiring specialized technology and maturing analytics behaviors aren’t easy. But what people don’t know — at least, not yet — is that the hard part of big data is managing it. The challenges of identifying and sourcing the data, applying data correction rules, circumscribing usage, access, and storage policies, and provisioning the data to other platforms and applications requires its own set of rigor. Regulatory requirements mandate that your bank mask Social Security numbers before availing half a billion credit card transactions to hungry data scientists hoping to fortify themselves on fraud indicators. Simply applying a new file system and some statistics to the problem without first applying business rules to the data can mean large fines and, maybe worse, additional regulatory scrutiny.
Not to put too fine a point on it, but data governance was also the hard part of business intelligence, data mining and CRM projects. Seeking to leverage the promise of these technologies early, many companies wound up over-investing before realizing they’d severely underestimated the complexity of their data. And that was before social media, Web logs and sensor data started pushing the limits.
I’m not one to hand-wring. I see the promise of big data and the foresight of investing in the technologies that process and provision it. Big data will not only let us save our customers from defecting, but could save our planet from overheating. The promise of big data analytics is as expansive as our imaginations. But I’ve also seen the garbage-in, garbage-out phenomenon writ large on the balance sheets of naïve executive teams. Solid data governance and data management processes can mean the difference between new legacy technologies and innovative business actions.
I asked my new marketing buddy a few questions about third-party data enrichment, data stewardship, customer record reconciliation and data privacy policies. She didn’t know much about how her company governed its data, and seemed distracted by the conference program.
“I just have one question,” she said. “What’s Splunk?”