REVIEWER: Jeff Monica, manager of data quality for StorageTek.

BACKGROUND: StorageTek is a $2 billion global company that, through its information life cycle management strategy, enables businesses to align the cost of storage with the value of information. The company's innovative storage solutions manage the complexity and growth of information, lower costs, improve efficiency and protect investments.

PLATFORMS: Windows 2000 and Windows XP for DataFlux dfPower Studio. UNIX Solaris for the DataFlux dfConnector for Informatica.

PROBLEM SOLVED: In the early 2000s, StorageTek began building a corporate information factory (CIF), consisting of a data warehouse and multiple data marts. The CIF is designed to capture data from operational systems and assemble it into a usable format to allow users to make better decisions. The CIF integrates data from more than 60 operational systems (such as SAP R/3 and Siebel) and data sources residing in multiple countries. The CIF includes data about customers, products, finances, services/support and the supply chain - a true global view of corporate data. To build high-quality data within CIF, the company established a set of data management procedures to inspect, standardize, correct and validate information according to corporate protocols. We also needed a way to control data quality over time, allowing data owners to get instantaneous feedback on data quality problems before the overall level of data integrity declined.

PRODUCT FUNCTIONALITY: StorageTek chose DataFlux dfPower Studio, a comprehensive data management solution that encompasses data profiling, traditional data quality functionality and in-depth data monitoring capabilities. Initially, the company used dfPower Studio while developing the CIF to standardize data from source systems, analyze source system meta data and perform other common data quality tasks. After the development cycle, StorageTek transitioned to a production phase, concentrating on extending the use of the CIF and controlling data quality over time. The CIF team works with data owners/stewards on the business side to establish metrics for data quality. With dfPower Studio's data monitoring functionality, the system automatically tracks data quality metrics, and users can receive e-mails or system alerts when data exceeds pre-set limits.

STRENGTHS: With DataFlux ODBC connections and the DataFlux DB Record Viewer, users can quickly query multiple source systems and export data to a text file or Excel spreadsheet. Additionally, the data profiling functionality is excellent, allowing users to analyze a data quality issue or a new source system before connecting to the CIF. Also very helpful is the ability to easily obtain frequency distributions, blank counts, null counts, min/max lengths and pattern frequencies.

WEAKNESSES: Because many data quality issues involve multiple tables, dfPower Architect needs to support the correction and integration of more than two tables. When working with a scheme in the Analysis Editor, it would be helpful to sort the data or the standards alphabetically. Currently, the editor alphabetizes uppercase values first, then lowercase values second.

SELECTION CRITERIA: StorageTek required a multifaceted tool capable of going beyond data cleansing tasks typically associated with data quality technology. dfPower Studio provided best-of-breed profiling capabilities, powerful data quality tools, and innovative monitoring functionality - all from the same user interface.

DELIVERABLES: dfPower Studio allowed StorageTek to effectively integrate data into the CIF during the development stage. After working with business users to build procedures and business rules to improve data quality, these rules reside in the DataFlux Quality Knowledge Base, a repository used by all DataFlux applications to share information about data types and definitions. During the production phase, business users who understand the basic requirements for data quality are an integral part of setting control metrics. These metrics can be applied to null counts, pattern counts, uniqueness and other measures critical to each data set. Once established, these control metrics can provide feedback on data quality issues - before they compromise the data integrity of the CIF.

VENDOR SUPPORT: DataFlux provides fast turnaround on support and consulting requests.

DOCUMENTATION: The documentation has been of limited value.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access