A chief financial officer (CFO) was approached by the CEO and asked for an accounting of the company’s financial assets. The CFO gave a vague response indicating a lack of knowledge of the corporate bank account, little idea what was in each account and no idea about the status of accounts receivable. The board of directors asked the CEO about the intended use of the corporate assets and was told, "There is no plan for their use." The CFO and the CEO were soon looking at new employment opportunities.
Today, if most CIOs are asked about the assets under their control, a primary asset being data, most would be forced to respond that there is no inventory of data, little is known about the quality of the data and "there is no plan for the productive use of this asset." The turnover in CIOs is also high.
Current Status in Contemporary Organizations
Very few organizations, large or small, have a well-defined data strategy. If asked, some will point you to dusty and outdated volumes of database standards, usually geared specifically to their relational database management system (RDBMS). The more advanced organizations will have a subset of standards and perhaps a documented strategy on portions of what should be included in an overall strategy.
In most organizations, the value of data is not well understood. Data is considered the province of the department that creates it, and that department often jealously guards this data.
Data is usually addressed on a piecemeal basis. A company will launch an effort to choose its preferred RDBMS or will attack a database performance problem when response time becomes excessive. Rarely do organizations work from the big picture and, as a result, suboptimize solutions, introduce programs which may have an deleterious effect on the overall enterprise, cause inconsistencies that result in major efforts for interfacing or develop systems that can not be easily integrated
Why a Data Strategy is Needed
Not having a data strategy is analogous to a company allowing each department and each person within each department to develop their own chart of accounts. The empowerment would allow each person in the organization to choose his or her own numbering scheme. Existing charts of accounts would be ignored as each person exercised his or her own creativity. Even to those of us who don’t wear green eyeshades, the resulting chaos is obvious.
The chaos without a data strategy is not as obvious, but the indicators abound: dirty data, redundant data, inconsistent data and users who are becoming increasingly dissatisfied with the performance of IT.
Without a data strategy, the people within the organization have no guidelines for making decisions that are absolutely crucial to the success of the IT organization. In addition, the absence of a strategy gives a blank check to those who want to pursue their own agendas. This includes those who want to try new database management systems, new technologies (often unproved) and new tools. This type of environment provides no checks for those who might be pursuing a strategy that has no hope for success.
A data strategy should result in the development of systems with less risk and a higher success rate. It should also result in much higher quality systems. A data strategy provides a CIO with a rationale to counter arguments for immature technology, and data strategies that are inconsistent with existing strategies.
The vision of a data strategy that fits your organization has to conform to the overall strategy of IT that in turn must conform to the strategy of the business. The vision should conform to where the organization would want to be in five years.
Components of a Data Strategy
This is a list of the primary components of a data strategy:
Relational Database Management System (RDBMS) – which RDBMSs are standards for which platforms (OS/390, UNIX, Windows) and for what applications (e.g., OLTP, Business Intelligence, etc.)
- Management awareness and support for data quality
- Evaluation/diagnosis, identify data quality problems
- Which source data is the most correct
- Valid values (domains)
- Business rules ( number of dependents cannot be negative)
- Data types (hex, packed decimal, etc.)
- Missing data – completeness
- Inappropriate defaults
- Field used for multiple purposes
- Accurate data
- Cleanliness/standardization/availability of historical data – codes have changed, some data is new
- Timeliness of data – current data
- Validation of extract/transform/load (ETL) process – target created correctly
- Triage – which data to clean
- Cost of cleansing – some efforts not worth it
- Prioritization of cleansing effort – order in which data will be cleaned
- Responsibility for data quality – it’s not enough to say that data quality is everyone’s responsibility
- Management support for meta data
- Which meta data to capture?
- Responsibility for capturing meta data
- Responsibility for maintenance
- Business meta data
- Technical meta data
- How will meta data be captured?
- How will meta data be used?
- Tools that generate meta data
- Software/tools to capture/maintain meta data
- Capacity planning
- Roles and responsibilities
- Reporting performance
- Perception by management and user department
- Proactive vs. responsive modes
- Standards for what to distribute and when and where
- Cost to distribute
- Administration activities to distribute
Organization – Data-Related Roles and Responsibilities
- Database Administrator
- Data administrator
- Data quality administrator
- Architect (be careful, this has many meanings)
- Determine requirements for performance
- Determine requirements for availability
- Determine historical requirements
Security and Privacy
- Responsibility for determining security and privacy
- Mechanism for establishing security and privacy procedures
- Audit of security
- Regulatory issues (many are industry specific)
Total Cost of Ownership
Subject Area Databases
- Meta data
- Inventory of data models
- How and where should the models be developed and used
- Industry models (finance, healthcare, insurance, etc.)
- Motivation/incentives to share
- Management directives on sharing
- Goals and objectives
- Data mining
- Integrating business data
- Data redundancy
- Different RDBMSs
- Inventory of legacy/operational data
- Evaluation for use/quality
- Databases required by the application packages
- Database choices
- Impact of package on capacity requirement, performance and availability
- Organization standards
- Criteria for selection
- Responsibility for selection
- Single vendor/best-of- breed
- Relationship with vendors
- Money issues
- Types/format of data (Access, Excel)
Categorization of Data
Communicating and Selling the Data Strategy
- Technical folks
- Cost justification
- Tools to measures
- Roles and responsibility for measurement
- Measuring usage
- Reporting to management
- Ability to manage the environment
- Service level agreements (SLAs)
Unstructured Data Types
The importance of these components will vary from organization to organization but the totality of their importance is compelling. They cannot all be attacked at once. A triage approach will identify those components that are critical and must be addressed first.
Without a comprehensive data strategy, organizations are not making optimal use of one of their key assets, data. Without a data strategy, organizations have little basis for making tactical and strategic decisions that can spell the difference between an organization’s success and failure.
Sid Adelman is a principal in Sid Adelman & Associates, an organization specializing in planning and implementing data warehouses, in data warehouse and BI assessments, and in establishing effective data architectures and strategies. He is a regular speaker at DW conferences. Adelman chairs the "Ask the Experts" column on www.dmreview.com. He is a frequent contributor to journals that focus on data warehousing. He co-authored Data Warehouse Project Management and is the principal author on Impossible Data Warehouse Situations with Solutions from the Experts and Data Strategy. He can be reached at (818) 783-9634 or visit his Web site at www.sidadelman.com.