The Data Landscape Metamorphosis
As more businesses recognize the importance of managing data within their organizations to gain business intelligence, the realization that data solutions in the market today are not truly achieving business intelligence (BI) is becoming evident. Most data solutions are targeted at only one kind of data, structured data that resides in databases and other structured sources. Industry research indicates that up to 85 percent of a companys data resides in areas classified as unstructured.1 These unstructured sources include everything from emails, word documents and client service notes, to videos and photos. Furthermore, as technology continues to evolve more options are created for data storage including instant messages, text messages, wiki and blogs causing the stores of unstructured data to grow exponentially. While all of these new tools and resources increase communication, they also represent information loss since many of these communications are not organized, analyzed or logged, consequently creating an information void in corporate decision making and planning.
Data quality is recognized as one of the most important processes in the area of data management. Data quality is an important process step that needs to be in place before the data from any type of source, structured or unstructured, is used within the enterprise. Without a comprehensive data quality management framework in place it is almost impossible to reach the ultimate goal of data intelligence. The introduction of unstructured data creates one of the biggest challenges in the area of data quality management.
Many companies are involving countless resources attempting to harness their data and fix these intrinsic problems, in order to gain the most complete information and protect themselves from possible regulatory or compliance issues. As more and more businesses move from data analysis to information insight in their decision-making, the issue of structured data, unstructured data and data quality management is becoming increasingly important.
The Commercial Software Gap
Once organizations grasp the full financial impact of the data issues in their organization, the evaluation process usually begins for a solution to address the problem. This search typically returns a group of vendors who provide structured/unstructured data integration and data quality solutions utilizing proprietary software models, entrenched in lengthy term commitments with implementation and services agreements. Furthermore, these solutions are always discrete pieces of software each addressing one specific area of the data integration challenge. These solutions, even when they come from the same vendor, are typically not integrated very well and have different programming languages, interfaces, etc.
Todays marketplace solutions for data integration and data quality do not have a single solution for structured, unstructured data and providing a data quality framework on a integrated platform. Creating interfaces to existing commercial that analyze and cleanse structured and unstructured data easily becomes a costly, lengthy and daunting task for organizations. The lack of flexibility and prevalence of cookie-cutter data solutions in the marketplace are making the integration of unstructured data analysis into the structured data analysis process a seemingly impossible task. Hence, many organizations simply defer the unstructured data issue and address their structured data issues only, leaving their companies with an insight void in decision-making. Companies that do embark upon unstructured data analysis in concert with a commercially available data solution find themselves encumbered with a complex, inefficient process marked by multiple development platforms, custom programming, extensive testing, proprietary interface development and lengthy implementation time frames.
Furthermore, because of license restrictions, the implications of implementing a commercially based solution across the enterprise at multiple touchpoints becomes prohibitively expensive when the unstructured and structured data integration and data quality management are considered - making data intelligence only achievable by organizations with the deepest of pockets and maximum resources.
Achieving Data Intelligence and Insight with Open Source Data Solutions
While open source continues to gain acceptance and utilization across business enterprises, comprehensive open source data solutions are not in wide use in the marketplace. However, open source-based data solutions are especially well equipped to address the issue of comprehensive data integration and data quality management with structured and unstructured data.
To achieve complete data intelligence a data solution is needed that can integrate the tools that to address unstructured data and structured data seamlessly and, in addition, provide a comprehensive data quality management toolset. Industry research indicates that piecemeal approaches to unstructured data analysis should be avoided due to the intrinsic lack of efficiency and likelihood that the segmented approach could negatively impact the compiled data results.2 Open source is especially well suited to address this key functionality. Open source solutions are intrinsically flexible and allow customization, enabling integration of the disparate tools to provide a comprehensive, accurate view of the data. The flexibility of open source based solutions enables an organization to easily interface with unstructured data solutions without the costly reprogramming required of a cookie-cutter solution. Open source solutions can be customized to address the specific needs of an unstructured data tool and build the bridge to the structured data solution, assuring that the data quality and integration of structured and unstructured data are performed within a single integrated process rather than in multiple disparate processes.









