Ventana Research
for Information Management Blogs
JAN 25, 2012 3:38pm ET

Blogroll

blog

Big Data is More Than Hadoop

Print
Reprints
Email

We recently published the results of our benchmark research on Big Data to complement the previously published benchmark research on Hadoop and Information Management. Ventana Research undertook this research to acquire real-world information about levels of maturity, trends and best practices in organizations’ use of large-scale data management systems now commonly called Big Data. The results are illuminating.

Volume, velocity and variety of data (the so-called three V’s) are often cited as characteristics of big data. Our research offers insight into each of these three categories. Regarding volume, over half the participating organizations process more than 10 terabytes of data, and 10% process more than 1 petabyte of data. In terms of velocity, 30% are producing more than 100 gigabytes of data per day. In terms of the variety of data, the most common types of big data are structured, containing information about customers and transactions.

However, one-third (31%) of participants are working with large amounts of unstructured data. Of the three V’s, nine out of 10 participants rate scalability and performance as the most important evaluation criteria, suggesting that volume and velocity of big data are more important concerns than variety.

This research shows that big data is not a single thing with one uniform set of requirements. Hadoop, a well-publicized technology for dealing with big data, gets a lot of attention (including from me), but there are other technologies being used to store and analyze big data.

The research data shows an environment that is still evolving. The majority of organizations still use relational databases but not exclusively: More than 90 percent of participants using relational databases also use at least one other technology for some of their big-data operations. One-third (34%) are using data warehouse appliances, which typically combine relational database technology with massively parallel processing. About as many (33%) are using in-memory databases. Each of these alternatives is being more widely used than Hadoop. As well, 15% use specialized databases such as columnar technologies, and one-quarter (26%) are using other technologies.

While these technologies enable organizations to do things they haven’t done before, there is no technological silver bullet that will solve all big-data challenges. Organizations struggle with people and process issues as well. In fact, our research shows that the most troublesome issues are not technical but people-related: staffing and training. Big data itself and these new approaches to processing it require additional resources and specialized skills. Hence we see high levels of interest in big-data industry events such as Hadoop World and the Strata Conference. Recognizing the dearth of trained resources here, some academic institutions have launched degree programs in analyzing big data, and IBM has started BigData University.

Research participants cited real-time capabilities and integration as their key technical challenges. The velocity with which they generate data and the fact that over half the organizations analyze their data more than once a day are forcing them to seek real-time capabilities; the pace of business today demands that they extract as soon as possible all useful information to support rapid decision-making.

With respect to integration, less than half of participants are satisfied with integration of third-party products, and almost two-thirds cite lack of integration as an obstacle to analyzing big data. Three-quarters have integrated query and reporting with their big-data systems, but more advanced analytics such as data mining, visualization and what-if analysis are seldom available as integrated capabilities. Responding to such comments, vendors have been racing to integrate their business intelligence and information management products with big-data sources. As you consider big-data projects and technologies, make sure that the vendors you select can handle the big-data sources you must use.

Looking ahead we expect more changes in this evolving landscape. In some ways big-data challenges and the presence of Hadoop in particular have paved the way for other technologies besides relational databases. NoSQL alternatives, such as Cassandra, MongoDB and Couchbase, are gaining notice in enterprise IT organizations after the success of Hadoop. In-memory databases, once considered a niche technology, are being considered by SAP, in HANA, as its primary big-data analytical platform. There are differing opinions about whether these various big-data technologies will converge or diverge. We can look to the past for some indications of where the market might go. Over the years a variety of alternatives to relational databases have emerged, including OLAP, data warehouse appliances and columnar databases; each eventually was absorbed into relational databases.

We also see signs of the major relational vendors embracing big-data technologies. IBM acquired Netezza for its massively parallel data warehouse appliance technology. IBM has also invested heavily in Hadoop. Oracle introduced its own line of data warehouse appliances and recently brought a big-data appliance to market that includes Hadoop and NoSQL technologies. Microsoft has invested in massively parallel processing and Hadoop. We also see independent vendors such as Hadapt combining relational database technology with Hadoop. The past is not necessarily an indication of the future, but our research shows and recent market dynamics suggest it may be premature to write off the relational database vendors as out of touch.

In light of this information, I recommend that your organization explore various alternatives for solving specific challenges. At a minimum you should be aware of the alternatives so when the need arises you will know what is available. Use our big-data research to guide your use of these technologies and to help avoid some of the obstacles they present so you can be more successful in applying big data to business decisions.

This blog originally appeared at Ventana Research.

 

Advertisement

Comments (2)
David, great insight on these Big Data trends. I'd like to point out HPCC Systems is a mature platform which presents numerous benefits over Hadoop and other alternatives, including a consistent and homogeneous architecture, and end to end solution including data workflows and data delivery (through the Thor and Roxie components), and a high level data oriented Domain Specific Language (DSL) called ECL, which enables the analysts to quickly and effectively define complex data ETL (Extraction, Transformation, Loading), Linking and Delivery processes. The HPCC Systems platform is quickly evolving to include ready to use and completely parallel implementations of solutions in different areas, such as Machine Learning, Statistical analysis and Document processing. For a more complete picture, visit http://hpccsystems.com
Posted by HAANA M | Friday, January 27 2012 at 7:53PM ET
Hello David -

Thanks for sharing the research of your recent study. Your thoughts are definitely in line with the advice that we are providing our clients. I especially appreciate your comment that there is no technological silver bullet, which you elaborate with a discussion about people and process. In addition, we feel there is not a "one size fits all" approach for the design of the technical infrastructure. Although we now have technologies that allow organizations to process all of the data, with an abundance of models and variables with high performance computing technologies such as in-memory, in-DB and grid, we also see a real need for identifying relevant information up-front in the process. This includes leveraging analytics on the front end of the process - so instead of always landing the data, you use analytics based on organization context to determine relevance, an approach that we refer to as "stream it, score it, store it".

On the Hadoop side, it's important for an enterprise to understand the implications of big data and how the new tools work before embarking on a big-data initiative. Keith Collins, our CTO, noted that "Those who are just standing up Hadoop as is, with no management framework, writing directly to it ... there's going to be some real disillusionment there. The data issues come after the question." He goes on to say that enterprises have to know what they want to find out from their data and then deal with how to get that out of their data.

We are blogging extensively on the role of Big Data Analytics and Hadoop, including a post which you can find here - http://blogs.sas.com/content/datamanagement/

Thanks,

Mark Troester IT/CIO Thought Leader & Strategist SAS Twitter @mtroester

Posted by Mark T | Monday, January 30 2012 at 10:56PM ET
Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.

Blog Archive for David Menninger

Predictive Analytics: Special Skills Needed
SAS: SaaS and Big Data in Store
A Big, Cloudy, Mobile and Social World
What Enterprises Can Learn from 2011's Major Events and Surprises
Yellowfin 6 Advances Collaboration and Mobility Capabilities

More from David Menninger »

Blog Index »

Where do young IT professionals (30 and under) obtain information to aid with daily role responsibilities and career development?

Trade publication websites 14%
Social media 23%
Vendor websites 4%
Vendor/community forums 7%
Newsletters 1%
Trade conferences/meetups 2%
RSS feeds 6%
Web search 44%

 

Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.