DataOps is the key to success in the era of AI and machine learning
It’s no secret that many a blogger, speaker and vendor has weighed in on what artificial intelligence is and what it is not. Is it or is it not the new electricity? The new BI? The new UI? The new black?
Whatever it is, we know that, after many false starts, AI has started delivering real value by modernizing data analytics. And we know that the fuel for the next generation of intelligence on earth is data.
Our ability to tap into vast sources of previously untapped and unrelentingly increasing “big data” has galvanized throngs of citizen data scientists into democratizing machine learning to train predictive analytic engines. The long-predicted arrival of large-scale data storage-compute in the cloud is, at long last, real and makes AI economically feasible. And innovations in message brokers and “streaming platforms” has sped up traditional business intelligence, bringing AI to the next generation of smart applications.
While the business value of AI is easy to understand and evangelize, the operational implications of these trends are far more complex than is generally understood.
What’s driving this complexity? Applications are becoming agile, smart and more focused, driven by the trend towards microservices. At the same time, unlike oil, data in these applications becomes more valuable with reuse, and changes semantics frequently. Infrastructure and data platform choices are being made and remade autonomically, with little regard to the broader impact to the enterprise, leading to unexpected changes in structure and locations.
Having the right data at the right time and with the right level of confidence at the point of use is priceless, but all these unexpected, unannounced and unending changes to data, collectively termed data drift, is beyond our control and leads to operational risk.
While still in its early days, I believe that in 2020, we will see more pervasive interest in DataOps. DataOps is the set of practices and technologies that brings the end-to-end automation and monitoring sensibilities of DevOps to data management and integration.
But what makes it DataOps are drift-resilient smart data pipelines, from which living, breathing end-to-end data topologies emerge. Instead of ignoring or fighting data drift, DataOps embraces and harnesses it to speed up data analytics, with confidence.
Some indicators that we’ve noticed here at StreamSets include a small, but burgeoning cross-section of customers that are embracing DataOps approaches. The recent DataOps Summit highlighted many of their use cases and resulting business impact.
Searches for the term “DataOps” have tripled, vendors are entering the space with DataOps offerings, and we’re seeing a number of DataOps business titles appearing on LinkedIn profiles. All point to an emerging understanding of “DataOps” and recognition of its nomenclature, leading to the practice becoming something that data-driven organizations refer to by name. The new book DataOps: The Authoritative Edition outlines much of this in detail.
DataOps is the foundation upon which all software will be built in the future, teasing an inherent order and discipline out of the chaos that's otherwise caused by agile, autonomic technology decisions. Enterprises today must think ahead to implement the technologies that will enable DataOps practices if they are to survive and thrive in the era of artificial intelligence and data dominance.