In the two years since Gartner proclaimed “self-service data preparation” to be an essential component of the broader analytics market, the technology and competitive landscape has changed dramatically.
There are now dozens of vendors offering stand-alone data prep tools, data prep features integrated into business intelligence (BI)/data science platforms, and data prep utilities added to data visualization and data quality solutions. There are solutions available onsite or in the cloud, some with rich scripting and data mining features, others with automation and modern user-experiences intended for non-technical/business users.
Data types used for analytics are evolving just as quickly, with big data, streaming data and machine data adding to the ongoing challenge of analyzing enterprise application data, log data, web information and historical data locked in reports and documents.
Simultaneously, BI analysts are struggling to respond to business requests, there is an endless shortage of data scientists, and business analysts continue to grapple with data access, data blending and data reconciliation. These analysts are often unable to find the information they need and unaware of the self-service data prep tools that are available to improve their productivity.
At the same time, social technology and the addition of social features have forever transformed the way we live and work. Social media platforms have dramatically increased peoples’ expectations about the availability and timeliness of information. Users increasingly have these same expectations for business information, regardless of where the data resides or how it’s formatted. They demand instant access to information and the ability to easily share it with key stakeholders.
Rapid innovation in the self-service data-preparation market has delivered many advances, such as: access to dark data locked in PDF documents and semi-structured data within enterprise reports, log files, invoices and machine data; point-and-click access to critical business data sources; drag-and-drop access to web and third party data; direct exports of analytics-ready data to data visualization tools and BI platforms; and built-in automation and governance functionality for security and compliance.
However, for most analysts and operations employees, data access is restricted to personal data sources, historical reports or information painstakingly controlled by IT and BI gatekeepers. Plus, teams are limited to sharing what little data they have via personal Excel spreadsheets, increasing compliance concerns and diminishing the trust in their analysis.
Enter data socialization. Data socialization is an evolution in data accessibility and self-service across individuals, teams and organizations that is reshaping the way organizations think about, and employees interact with, their business data.
Data socialization involves a data management platform that unites self-service visual data preparation, data discovery and cataloging, automation and governance features with key attributes common to social media platforms, such as having the ability leverage user ratings, recommendations, discussions, comments and popularity to make better decisions about which data to use. It enables groups of data scientists, business analysts and even novice business users across a company to search for, share and reuse prepared, managed data to achieve true enterprise collaboration and agility.
In this “data utopia,” any employee can easily find and use ANY data that has been made accessible to them within their data ecosystem, creating a social network of certified curated and raw data sets, with controls and limitations defined for each individual. This fosters a culture of data access, where users can learn from each other, be more productive and better connected as they source, cleanse and prepare data for analytical and operational processes. And, ultimately, it enables organizations to expedite analytics outcomes to drive better and faster business decisions.
Some other key characteristics of data socialization include: having the ability to understand the relevancy of data in relation to how it’s utilized by different user roles in the organization (e.g., sales operations or internal auditing) and follow key users and data sets, as well as collaborate to better harness the “tribal knowledge” that too often goes unshared; being able to search on cataloged data, metadata and data preparation models indexed by user, type, application and unique data values to quickly find the right information; and, machine learning that identifies patterns of use and success, perform data quality scoring, suggest relevant sources, and automatically recommend likely data preparation actions based on user persona.
Increasingly, business applications are inheriting social features to improve business collaboration, making individuals and organizations more informed, agile and productive. Data socialization brings those benefits to self-service data preparation and ultimately, self-service analytics, by eliminating the common barriers to data access and sharing. It creates a “data utopia” for BI analysts, data scientists and business users that boosts their productivity and speeds decision making. Just think of the possibilities…
Data Socialization In Action
Data socialization empowers business users, data scientists, analysts and IT throughout the enterprise to work together with their data.
IT and BI – Face the challenge of balancing the ongoing need for standard reporting with the ever-growing demand for investigative and ad hoc analysis. In addition to maintaining data governance and regulatory compliance, IT and BI organizations can improve response times and deliver more value to a broader subset of business and technical users, while taking full advantage of their existing BI, Big Data and back office investments.
Analysts and data scientists – Need to reclaim lost hours and improve productivity. With the ability to acquire and prepare data from any source, eliminate/automate redundant work across different silos, and share techniques and curated data with their peers as they work, they can improve data quality and build trust in their analytics.
Information workers – Want to use data daily to make business decisions, but often lack the technical skills to access and prepare data; they will no longer be left out of the process. By following the activities of peers as well as leveraging shared and curated resources, information workers can be more effective at sourcing and interpreting information and increasing their skill sets rapidly as they go about their daily tasks.
(About the author: Michael Morrison is the president and chief executive officer of Datawatch Corporation, where he’s responsible for driving the vision and strategy for the company’s business growth and market leadership. Prior to Datawatch, Morrison was vice president, financial performance management at IBM, and held a similar position at Cognos prior to its acquisition by IBM in 2008.)