How to spot bad data, and know the limitations when it's good

Register now

Accurate and reliable data can bring context to research studies, help people understand trends, aid business managers in knowing what’s working well for achieving company goals and much more. However, not all data is as beneficial as it seems at first. Bad data can negate all the positive factors of trustworthy information.

Sometimes there are glaringly apparent imperfections in data that IT decision-makers spot right away. For example, they might find various misspelled names, incidences where an entry appears two or more times in a list but should only be there once, or date-related inaccuracies.

If the date field of a spreadsheet indicates that February has 30 days, that’s a clear oversight. However, there are other telltale signs of bad data that aren’t always evident by visual means alone.

Data May Be Biased

People who work with data may unintentionally or purposefully look for data that supports their findings or theories. When it happens automatically, it’s a phenomenon called confirmation bias, where people search for and notice information that aligns with their views while ignoring material that does not.

Using data responsibly and being mindful of potential bad data means working hard to neutralize bias and potentially taking the step of depending on someone who has a neutral stance to fact-check the material.

The Danger of Untrusted or Desperate Sources

The world became reacquainted with the concept of fake news due to the Trump administration and the circumstances surrounding his election. Spotting bad data may mean verifying the sources that revealed specific, externally gathered statistics.

If a person reads a headline that declares 95 percent of businesses got attacked by a hacker during a given year, they might accept it as fact without checking the source of that alarming claim. If it came from a news outlet such as The New York Times, The Washington Post or The Guardian, those are three well-known sources that would not knowingly damage their reputations by publishing inaccurate news.

However, what if those hacking statistics came from a security firm that only got established in a community last month and had a history of trying misleading tactics in the United Kingdom before transferring to the United States? Then, it might be the case that the company is lying or using other eyebrow-raising strategies to make people scared about getting hacked and feel compelled to use the business’s services.

Scientists have also manipulated data in research studies to make it highlight certain conclusions. Then, other investigators waste time and money to replicate the findings and later realize they were chasing a hopeless cause. Plus, companies citing the data could also be accused of fraud due to mere association with the doctored statistics.

Evaluating the worthiness of data involves determining whether it’s possible that the people or company responsible for its publication might be trying to enhance their reputation, bolster profits or enjoy another advantage that goes beyond keeping people informed.

How Bad Data Causes Harm

Data scientists, marketing managers and other people working with data aren’t always honest about the limitations of data — and there may be gaps in the way it’s managed that cause inaccuracy. If decision-makers put too much emphasis on flawed data, they may make mistakes and feel less confident about using data to educate their conclusions in the future.

A 2016 survey of CEOs found 84 percent of them felt concerned about the quality of data they used while making decisions. And they have valid reasons for feeling wary — bad data could cause financial repercussions if business leaders put too much trust in material that’s ultimately lacking.

It’s also crucial to consider the wasted time from bad data. When professionals engage in data-driven marketing, they may be relying on content filled with non-human influences such as bots or malware. If that happens, they could get false perceptions of customers’ journeys at websites or the factors that cause them to linger on certain pages versus others.

There are reputational risks, too. If a company releases public research that later gets proven inaccurate, it’ll be difficult for that entity to encourage trust in future material.

When business leaders blindly trust data — especially when making decisions — they inevitably set the stage for problems. Staying aware of the characteristics of bad data discussed here is an excellent first step in being proactive. Furthermore, people who deal with data must be mindful of its limitations and demonstrate honesty when disclosing those shortcomings to others.

For reprint and licensing requests for this article, click here.