Big data is an area emerging data technology receiving quite a lot of attention from software vendors, industry analysts and data professionals for how it will revolutionize company operations and customer insights. Big data is also changing how we will architect our data infrastructure for the future as well as how we think about data quality. Data professionals and chief data officers are involved and leading many of these initiatives inside their companies.
Often overlooked are other emerging technologies that that can impact -- both positively and negatively -- data quality and data management practices. Data professionals would be wise to understand these new technologies and proactively participate in their deployment.
Specifically, there are four emerging technologies to watch:
- Crowdsourcing technologies
- Mobile applications
- Cloud solutions (both software as a service and platform as a service)
- In-memory databases
As a data management professionals, ask yourself these questions in considering each of the four areas:
- How can this technology fit within my overall data strategy?
- Can the implementation of this technology inside my company have a negative impact on data quality in some way?
- How can I exploit this technology to improve data quality management?
Crowdsourcing is a method of using the openness and access of the Internet, combined with a large network of people, to solve a problem or solicit ideas. It combines the efforts of numerous self-identified volunteers or part-time participants (i.e., the crowd). For example, crowdsourcing solutions are being used today to distribute a complex problem across the Internet, taking advantage of individual computing resources, talents and experiences. Perhaps the most visible example of crowdsourcing is crowd funding. Crowd funding helps finance charitable and commercial causes, by sourcing “micro” funds from many people via the Internet. Crowdsourcing is typically supported by applications, technologies and by a specific organization (the platform) which brings together the project initiator and the crowd.
In the data management arena, crowdsourcing companies collect and validate information from the various available Internet sources, including social media. Instead of relying on bots to troll the Internet, they use people who can make valuable decisions about the data they find. Because works are sourced around the world, crowdsourcing companies can perform these tasks in a cost-effective and timely manner.
Traditionally, companies hire their own back office data operations employees (often in low-cost countries) to enrich their customer account and contact information. Crowdsourcing is a cost-effective alternative to this method. The “crowd” is dispersed all over the world, making access to local information for address verification and other regional company verification easier. The cost per transaction using the crowd is usually below the traditional resourcing model. However, crowdsourcing cannot enrich all information; it can only benefit from information which is available on the Web. Also, the overall accuracy and completeness rate of the task increases with larger volumes of the request given to the crowd.
While not for every problem, crowdsourcing provides an alternative to data enrichment that is worth considering for certain data management challenges.
Consider how the crowdsourcing method can be used inside your company to improve data management processes. How can the “crowd” of company employees participate in improving data quality and data usability processes?
In today’s data management processes, data experts devise ways to measure the relevancy and currency of information stored in the companies’ databases. Methods are developed to score information based on recent usage. “Best record” methods are developed to identify duplicates and determine the surviving records. All these methods take time to develop and implement.
Also consider that today most employees find it very difficult to report on the quality or usefulness of the data in a database. Employees need to find who in the company is responsible for the quality of the information or the appropriate data owners. Most employees simply give up and pursue other means to get the information they need, keeping the problem to themselves.
The crowdsourcing approach can be used to improve relevancy, accuracy and usefulness of data by making it easier for employees to rate the information and make that rating viewable to everyone within the organization. Using a “like” approach (think Facebook) will allow all employees to rate the usefulness of the data in the core applications. While software application vendors would need to provide some capability to do this inside packaged applications, as data management professionals we should be requesting this capability.
If developing your own in-house applications where customer, employee and product information is created and maintained, consider including a rating approach to engage all your employees in data quality management. Put your own crowd to work.
As smartphones and tablets become as ubiquitous as cellphones, mobile applications are growing in staggering numbers. While extremely popular for personal use, companies also realize the business value of these devices and are transforming them into mobile offices for their employees. Business applications are being developed for all aspects of an employee’s work day, and not just for those who travel frequently. Mobile applications for sales management, employee HR transactions, customer interactions and marketing are areas where master data can be created or updated. Therefore, data professionals should ensure their upfront involvement in the development of these solutions. These mobile applications have the ability to make data entry easier and faster, but without a data professional involved they may create more data quality issues on the back end.
As an example, mobile applications can be used to improve data quality in customer master data management. Many companies struggle with getting their sales and consulting teams to keep customer account and contact information current. Traditional CRM applications are often not designed with good data management practices and ease of use in mind. As a result, searching, creating and updating customer data is difficult and cumbersome. Most sales employees choose not to keep their customer data up to date in the CRM systems, leaving it to the back office sales operations teams to maintain -- or, worse yet, maintaining it in their own personal contact applications. Redesigning the search for, creation of and updates to customer data, and then adding the “like” feature to rate and rank the relevancy can significantly change the engagement level of the sales and consulting employees and improve the data quality of these key customer data fields.
Designing for effective data management and ease of use in mobile applications consists of:
- Checking for data quality standards at creation time, including duplicates and address standardization. Don’t let the application designers tell you that mobile apps will take a performance hit. That shouldn’t be the case if designed appropriately.
- Limiting the free-form entry of critical fields. Use pull downs and auto-populate fields. Google has established a standard by bringing up options as you type. Incorporating this into mobile design is a great way to engage the data creator to make it right the first time.
- Significantly reducing the number of screens and clicks required to create or update a record.
- Listing the most relevant (“liked”) records at the top of the screen. Limit the number of relevant records to five.
- Defaulting to the most used views and records.
- Planning for ongoing archiving old records.
Mobile applications can and should be leveraged in your data management strategy. They offer the ability to simplify data entry and correction but they require the right design and involvement from data management professionals.
Cloud computing makes it easier for companies to deploy new business applications and to standardize their processes by using cloud offerings. Total cost of ownership is reduced as the licensing cost per user is far less than the total cost of hardware, software and manpower required to maintain an on-premise application. Cloud solutions offer a broad selection of software as a service as well as platform as a service packages. Examples include CRM, HR, employee expenses and countless other applications that can be used as building blocks.
The benefits of a standard cloud offering can also be a disadvantage when it comes to data management processes. Designing for simplicity, ease of use and high data quality standards requires additional functionality, yet most cloud packages offer limited data management customization. Adding more customization reduces some of the flexibility of a cloud offering and increases the total cost of ownership.
What should a data professional to do? First, be involved in the selection of the cloud offering and evaluate the data management processes and ease of use for data entry. Consider these areas:
- What are the data quality standards that are enforced in the cloud solution? Is there any flexibility in the data model?
- Are they sufficient to maintain the quality of the data?
- What customization is available in the cloud offering to add data quality checks, duplicate record analysis, and automated business rules for enrichment? Can you add modules to the application for this specific purpose?
- If limited customization, what data enrichment will be required and by whom?
- What functionality does the cloud provider make available for ETL of the data?
Cloud offerings have the opportunity to improve the responsiveness and flexibility of your IT organization, but there are potentials risks to data management. Make sure that you consider the additional cost of rework or data remediation in the total cost of ownership.
The fourth emerging technology to watch is in-memory databases. Recent technology advances are making in-memory databases more affordable, faster and generally more feasible for a range of businesses. As a result, industry experts expect the use of in-memory databases to grow significantly over the next 24 to 48 months because of increased power and reduced pricing. These databases are used to speed up transactional and analytic business processes. This is important, because the nature of both transactional and analytic BI has evolved. Companies can no longer afford to wait days or weeks to format data and then retrieve answers. In-memory databases allow real-time evaluations, predictions and adjustments to business processes.
In-memory databases can also be exploited for improved data quality management, especially data quality reporting.
Traditional data quality reporting involves moving the relevant data fields from the operational systems to the reporting platform where data quality business rules are run in order to develop the data quality scores that are then reported via some visualization software. This batch process occurs monthly or weekly, but certainly not daily or real time. Another issue with traditional data quality reporting is the time it takes to add a new field for evaluation; it could take weeks to model and add to the reporting platform. The typical result is that data quality reporting lags the business process changes.
While most business processes don’t require real time or daily data quality reporting, there are some processes where speed and performance in understanding the data quality of critical fields could materially affect the outcome. This is true for business processes such as:
- Mergers and acquisition data integrations
- Business and financial forecasting during quarter end close
- Fraud management at a call center
- Capital investment modeling
In-memory data quality reporting of the incoming data and the correction of data quality issues on a daily or real-time basis would also improve employee productivity and deliver faster time to value.
As your company deploys in-memory databases, data professionals should also exploit these capabilities for managing the data operations. In doing so, they should also prepare for change to their data operation processes in order to exploit these new capabilities.
Bringing it all Together
Data professionals should be involved in the evaluation, design and deployment of emerging technologies beyond big data. First and foremost, ensure these new technologies “do no harm” to the data quality programs in the company. Secondly, consider how these investments could be exploited to improve data quality and data management processes. Like big data, these technologies can bring significant benefits to your data strategy.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access