The importance of maintaining transparency in data analysis

Register now

Data analysis relies on accurate collection and processing methods. Despite the most meticulous data collection methods, mistakes are inevitable. Even with the help of software, the process of collecting data must be engineered by people, and people are prone to conscious and unconscious bias.

Some types of bias are difficult to identify, so business leaders and data team members need to be ready to take steps to minimize the effects.

Transparency is the fundamental key to mitigating the potential for biased data collection and analysis. Transparency gives data team members the opportunity to spot errors and work to correct unconscious bias.

Transparent collection methods

For data to be accurate, it must be collected carefully and completely. What constitutes complete data collection is subjective. For instance, collecting data through a web form will only provide you with the data you specifically ask for. You might be unaware that you’re not collecting enough (or the right) data.

When everyone working with the data can see what data is being collected, it increases the chances of spotting errors. For instance, if your company is profiling website visitors but isn’t collecting visitor locations, someone on the marketing team will notice this missing data and point out that you might have an overseas market you’re unaware of.

Accuracy depends on running the right calculations

Running calculations doesn’t mean much unless they’re the right calculations. Excel spreadsheets are the easiest way to capture and organize data to reveal what calculations should be run. These calculations, performed manually, are crucial to a data analyst’s accuracy.

There’s a reason Excel has been the top spreadsheet program for years, despite Google launching “Sheets.” Google Sheets appears to be the better choice because it’s easier to use, convenient, and allows for collaboration. However, the people raving about Google Sheets aren’t relying on it for complex data analysis.

Serious data crunchers, on the other hand, wouldn’t survive a day without Excel, nor would they survive using Google Sheets in its place. Excel has both basic and advanced features Google Sheets doesn’t have. Like advanced string functions that allow you to pull any number of characters from strings of varying lengths. The LEFT, RIGHT, MID, and LEN functions level up the ability to parse data, and although it seems complex, organizing data in multiple ways is essential for figuring out what it means.

Unfortunately, the importance of manipulating data in a spreadsheet to figure out which calculations to run has been lost on some of today’s CFOs. The Wall Street Journal reported Adobe Inc.’s employees are being instructed to stop using Excel for “financial planning, analysis, and reporting.” Instead, CFO Mark Garrett wants his people to focus on what the data means. There’s just one problem: organizing data is a critical first step in figuring out what data means.

While the sentiment is that cloud-based collaborative software is better, since it updates automatically and connects with existing accounting resource management systems, it’s not the same by any means.

Misleading data can be backed by real numbers

Numbers don’t lie, but interpretations and presentations can distort those numbers into partial truths.

Datapine explains, in detail, several ways data can be misleading, including:

  • Faulty polling
  • Flawed correlations
  • Misleading data visualizations
  • Purposeful and selective bias
  • Data fishing
  • Using percentage of change with a small sample size

Skewed data isn’t a mistake limited to big data analysts. It’s a bias embedded directly into the fabric of our science.

In 2009, Dr. Daniele Fanelli from the University of Edinburgh performed an investigative survey that discovered 33.7% of scientists admitted to modifying results to improve outcomes, interpreting data subjectively, and withholding analytical details due to gut feelings.

Nobody – not even hardcore scientists – are immune from injecting bias into data analysis.

Perfect data analysis isn’t possible, and that’s okay

Data analysis will never be perfect as long as humans are responsible for the process. The best anyone can do is set up systems that make it easier to identify and remedy bias when it’s spotted.

The best system to tackle bias is allowing team members to provide input when they notice something off about the data they’re working with. It’s in the company’s best interest to encourage teams to question the data they’re working with, and offer insight into a solution.

For reprint and licensing requests for this article, click here.