The world of Big Data is expanding rapidly as a result of software and technology becoming so advanced it makes it easier for those that are not data scientists to manage and utilize the large amount of data that is out there.
The advent of social media, online retail and more are expanding our digital footprint, leading to an expected doubling of the digital universe every year, helping it reach a projected 40 trillion gigabytes by 2020 . As methods of collecting and computing Big Data continue to evolve and become more sophisticated, I’ve seen an increase in the impact Big Data has on our daily lives. It’s swept into every industry and has become a vital resource that affects the operation and innovation of many businesses.
Data is collected through a growing number of different channels, including mobile applications, websites and face-to-face interaction. Accurate, reliable and timely information is a must for effective decision making regardless of whether it is individuals, communities, governments, organizations or businesses making decisions.
Big Data is also impacting government at the national, state and local levels. One example is the White House’s Smart Cities Initiative designed to spur the innovation of technology and the use of Big Data to address issues cities across the nation are facing. This includes revitalizing energy grids, improving infrastructure and traffic patterns, and expanding health care access.
As the amount of Big Data that is collected increases and its uses expand, so must the scrutiny of the collected data. Big Data applications with high-performance hardware and software are valuable in building more efficiencies into queries. However, they are not great at ensuring data quality.
Common errors, including incomplete or missing data, outdated information and inaccurate data, plague 91 percent of organizations globally. Due to these errors, it is believed that approximately 22 percent of data being collected is inaccurate.
That percentage is higher among U.S. companies at 25 percent. This can lead to inefficiency in customer loyalty programs and business intelligence, as well as hurt a company’s bottom line. To reap the full benefits of Big Data in today’s business environment, it is essential to establish processes for substantiating data.
Data quality depends not only on its own features but on the business environment using the data (business processes and users). Some of the common problems that cause bad data are conflicting data, sources or systems and business rules that are applied inconsistently.
In order to ensure quality data is being collected, businesses must confirm the data meets six important criteria: consistency, accuracy, completeness, timeliness, aligned purpose and relevant use. If the data does not meet these six criteria, it could negatively affect its overall quality.
All data technologies are affected by the quality of data taken in. Take, for instance, the cookbook compiled by IBM’s Watson. The supercomputer scoured the internet for recipes and flavor combinations that could be compiled to form delicious and intriguing meals. But much of this information was pulled from unknown sources, which could bring into question how these recipes might actually taste.
To help guarantee the quality of the recipes, IBM solicited the help of the Institute of Culinary Education. This included testing the limit of the recipes and developing strategies for cognitive cooking. There was a validation process put into place.
Like IBM, we too need to take steps to verify collected data. So what can we do?
To ensure data quality, companies need to define the specifics of good data and establish rules for certifying the data.
- • Define specific requirements for good data where it is used
- • Establish rules for certifying that data
- • Integrate those rules into the existing workflow
- • Continue to monitor and measure data quality
- • Test o Unit testing during development
- o Systems integration testing
- o Data validation o User acceptance testing
- o Performance testing o Retest with future development
At a higher level:
- • Assign ownership to the processes
- • Standardize the processes
- • Consider a cross-functional data council
Once companies have established these rules, they need to continue to monitor and measure the incoming data. This will help rectify any problems that may arise during the collection process, such as repetitive, incomplete or irrelevant data.
Data also needs to be tested, and this should be done throughout development. This includes validating data and testing systems integration and performance.
As our world becomes more connected and the amount of data that is available increases, companies must make sure they are developing their own processes for collecting the best data. Without verification, the chances of using flawed data increase, which could lead to skewed results and result in a tarnished brand or less profit.
Only by taking steps to authenticate data can businesses ensure they are getting precise conclusions.
(About the author: Karen Peters is the program dean and director of the Data Institute for University of Phoenix College of Information Systems and Technology.)
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access