Getting the Critical Role of Data Preparation Right
A new report from TDWI looks at the important role of data preparation in order to best use that data to drive business decisions.
In “Improving Data Preparation for Business Analytics,” TDWI’s David Stodder looks at technologies and method for establishing “trusted data assets”. The topic is an important one to all businesses that hope to make decisions and take actions based on the data they possess.
“The topic is attracting attention because many of the problems business users and analysts in organizations confront when working with data crop up during data preparation processes,” Stodder notes. “The BI and data management teams of IT are also burdened by problems with poor, ill-defined data and hard coding of preparation and transformation routines. Better data preparation can remedy these problems and help both business and IT become more productive and more effective.”
The challenges with data preparation are growing, Stodder notes, as data is increasingly created in “an increasingly mobile and fast-changing environment.”
In response, self-service data analytics tools are becoming more popular. They require less IT attention and enabling organizations to personalize the experience of working with data through data visualization. Such tools also make it easier for non-IT individuals to work with data. Some of these tools use machine learning, natural language process, and other advanced techniques to suggest data sets and guide users.
Equally important – data preparation needs to address data governance. As Stodder notes, “data governance is often regarded as being primarily about protecting sensitive data and adhering to regulations; indeed, data preparation processes are vital to meeting those priorities. However, data governance is expanding to include stewardship of data quality, data models, and content such as visualizations that users create and share.”
Significantly, more than a third of the companies that TDWI surveyed for the report said they are dissatisfied with their own data preparation efforts. The number one reason given is lack of adequate budget. The second most common barrier is not having a strong enough business case.
So how does an organization improve their data preparation processes? It starts by focusing on accuracy, quality and validity.
“This group of attributes is at the heart of data preparation and transformation processes and becomes particularly critical as multiple sources are integrated and blended for BI and analytics,” Stodder says. “To be accurate, data values must be correct and in the right form; otherwise data could be wrong or invalid. Data validation constraints and processes help ensure that clean and correct data is added to data sets.”
Beyond that, the most important factors in data preparation are:
- Frequency of data refresh
- Availability of access
- Conformance to data formats
- Completeness and depth
- Consistency across data sets
- Flexibility to change data for ad hoc needs
- Level of duplication
Stodder is a firm believer in self-service data preparation tools. He notes that they enable users to “do more on their own to serve their BI, data discover, and data analytics needs. New technologies for self-service data preparation are automating processes so that users have less need for manual work in finding the right data among a variety of incoming sources, cleansing it, and transforming the data.”
Perhaps most importantly, “self-service data preparation can enable users to be less dependent on IT and data specialists in their own organizations, not only for their expertise but for their time,” Stodder says.
“Self-service data preparation could relieve some IT burdens, as long as IT is confident in giving up some control of process,” Stodder continues. “Ideally, users and IT will work together to ensure that self-service enables users to be more productive but not increase data chaos and duplicative, uncontrolled work.”