In a survey of IT and data professionals on the progress of their big data initiatives, conducted by Qubole and Dimensional Research, only 8 percent of respondents considered their big data initiatives to be fully mature and 78 percent of respondents still support data requests on a project-by-project basis.

Those results highlight a weakness at the heart of most big data initiatives. Despite innovation around big data technologies, people and manual effort are still very much necessary for big data initiatives to run and people don’t scale. This human bottleneck limits an organization’s ability to scale and operationalize its use of data.

Complexity Outpacing the Experts

For all the fervor around the technologies enabling big data, people are still very much turning the gears.

Taking a look at the actual tasks that keep the architecture up helps paint a clearer picture. Nearly all essential data infrastructure and compute technologies these days are open source. Open source software tends to come as parts, rather than as complete solutions. That means that humans with advanced, technical skills are necessary to make the complexity work cohesively.

As anyone who has had to grow a team knows, humans don't “scale-out” like modern infrastructure does. No matter how your data teams are structured, there must be experts who understand both the infrastructure -- each node, cluster and grouping of clusters -- and the data itself and how it will be used. A wide range of technologies -- many of them open source -- must be pieced together to create coherent processes for serving data up for the analysts, engineers and developers who need it.

In a given week, any number of things can go wrong:

  • ETL pipelines break
  • Query performance lags
  • Data is wrong, late or corrupt
  • Necessary APIs aren’t there
  • Changing business requirements mean integrating new data processing applications

All of those not particularly strategic fire drills coupled with time consuming capacity planning and software updating keep organizations from the strategic work of scaling and operationalizing their big data efforts.

Helping the Humans

As you add more people to address demands of scale, each individual becomes slightly less effective due to the demands of coordination, collaboration, management and communication -- and before you know it, your inexpensive storage and free open source technologies have become quite costly to maintain. And, your people quickly find themselves overmatched by the demands of the organizations.

As an organization scales, keeping the machine running grows exponentially more complex. At a large scale, running multiple workloads that are competing for fixed resources is devilishly hard to manage and even harder to debug when something goes wrong. There are simply too many places where things can go wrong when managing a complex data operation.

The future, however, isn’t to replace them. Robots are not taking these jobs.

The people behind the modern big data organization have understanding, the context, the experience that won’t be replaced by algorithms any time soon. But, these people need help.

There are two paths for relieving the pressure and enabling data initiatives to grow.

First, use automation to make your people more effective. Allow them to focus on the high value tasks while automating out the mundane to improve your economies of scale.

Second, re-organize your organization so that your data teams are decoupled from the day-to-day demands of analysts to enable more agility and ultimately, faster time to insight. When the data teams are turning the crank on the service organization then bottlenecks emerge. Instead, data organizations should take a page out of the DevOps playbook -- decouple data teams from users, which ultimately enables both teams to work more efficiently.

Until we figure out the processes that lead to a “DataOps” model, organizations will continue to struggle to scale and operationalize their data efforts.
.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access