CATEGORY: Data Quality, Profiling & Augmentation
REVIEWER: Paul Nettle, data cleansing manager, Defence Logistics Organization, Ministry of Defence.
BACKGROUND: The Defence Logistics Organization (DLO) exists to sustain UK military capability, current and future. Its role is to deliver effective and integrated logistics support and information services to the front line and across Ministry of Defence (MoD) at best value for defense. UK Armed Forces comprise The Royal Navy, the British Army and the Royal Air Force.
PLATFORMS: We run Avellino Discovery on an IBM DRS 6000 under AIX with Microsoft Windows at the desktop. Many other environments are supported.
PROBLEM SOLVED: Historically, each of the three UK Armed Forces had an entirely separate supply organization. These have been restructured into one supply chain under the DLO, although still supported by three disparate central supply IT systems. The IT systems behind the three operations are entirely different, for example, in business processes, hardware platforms, operating systems, data design and the codes used for inventory items (e.g., packaging, unit of issue or hazard classification). The role of The Cleansing Project (TCP) is to ensure the data from these disparate systems is made ready for successful integration into a single DLO IT system. To do so, we need to understand the source data, determine its quality, accuracy and its conformance to standards such as NATO stock codes and other requirements. Because the three central supply systems contain more than 1.7 million records in total, it would be physically and economically impossible to analyze this data manually. The DLO selected a number of tools to automate the data cleansing process, and Avellino Discovery was chosen to profile and analyze the source data and to highlight data quality problems automatically.
PRODUCT FUNCTIONALITY: Avellino Discovery enables us to automatically identify data quality problems, thus ensuring that the DLO efficiently delivers the right goods to the right place at the right time. In the two years following project commencement, TCP has delivered supply chain cost savings of approximately $30 million and removed data inaccuracies and inconsistencies leading to the elimination of duplicate and obsolete items. Discovery has identified significant data quality problems within millions of data rows. It has done this quickly and economically. We also use it to monitor data quality over time, which means we can see whether dirty data is being cleaned speedily and new data being assimilated accurately.
STRENGTHS: We have calculated that in our environment, Discovery can carry out approximately 178 person-years of manual analysis in just one hour. It's fast, it handles very large data volumes (terabytes) and we have found it to be totally accurate. Its architecture also offers a very valuable and fast drill-down feature enabling the user to display a set of rows, an individual row and even individual attributes. The graphical representation of analysis is excellent and means not only that users can quickly absorb the findings, but also that users can effectively communicate the impact of data issues to business decision-makers.
WEAKNESSES: The current version allows for only one-to-one attribute analysis. Therefore, we require concurrent, multiple- attribute analysis. Therefore, we need to manipulate data before we can analyze it. (Multiple-attribute analysis is scheduled for the next Avellino release). The system demands high-power desktop systems at the client end.
SELECTION CRITERIA: Avellino Discovery was selected because of the product's stronger performance on complex test data, its scalability and the strengths described previously in this review. The attitude and capability of the vendor and its staff were also determining factors.
DELIVERABLES: Avellino Discovery has enabled us to gain a comprehensive understanding of our data and its quality. From its analysis, Discovery creates meta data, including values, range and frequencies, statistics and data definitions as well as highlights inconsistencies between actual data and supplied meta data. It identifies redundant and duplicate data and missing values, inconsistencies with standards, discovers joins, keys and data rules and generates entity-relationship diagrams.
VENDOR SUPPORT: Avellino and its staff are friendly and responsive. They have helped us get the best from Discovery.
DOCUMENTATION: Manuals are comprehensive. However, while online help is extensive, there is always room for improvement (but this is true of any software product).
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access