Bad Data, Bad Data Flows Still Plague Many Firms
Enterprises of all sizes face challenges on a range of key data performance management issues, from stopping bad data to keeping data flows operating effectively, according to a new survey by Dimensional Research.
A majority of the 300 data management professionals surveyed (87%), report flowing bad data into their data stores, while just 12% consider themselves good at the key aspects of data flow performance management.
The survey, sponsored by StreamSets, shows pervasive data pollution, which implies that analytic results might be wrong—leading to false insights that drive poor business decisions.
Even if companies can detect their bad data, the process of cleaning it after the fact wastes the time of data scientists and delays its use, the study notes.
Respondents cited ensuring data quality as the most common challenge they face when managing big data flows (selected by 68% of respondents). About three quarters of the organizations said they currently have bad data in their stores, despite cleansing data throughout the data lifecycle. While 69% of organizations consider the ability to detect diverging data values in flow as “valuable” or “very valuable,” only 34% rated themselves as “good” or “excellent” at detecting those changes.
And while detecting bad data is a critical aspect of data flow performance, the survey showed that enterprise struggles are much broader. Only 12% of respondents rate themselves as “good” or “excellent” at detecting a down pipeline, throughput degradation, error rate increases, data value divergence and personally identifiable information (PII) violations.