4 key steps to building a comprehensive data strategy
It’s widely recognized today that data is a strategic asset, and perhaps the most important asset an enterprise has available to it. With growing amounts data, organizations can either capitalize on, or forfeit, strategic opportunities to deliver greater value to customers, improve operational efficiencies and gain competitive advantages.
As chief data officers and data scientists play more prominent roles in developing data strategies in many enterprises, we see organizations struggling to contend with these challenges and taking a shortsighted ‘save it now, worry about it later’ approach.
These situations are worsening as data becomes more active and distributed across an enterprise, with many groups and individuals implementing unique and/or customized data management and storage solutions that often begin as unmanaged ‘aaS’ (as a service) projects and evolve into critical production systems with challenging data governance, security, access and cost management dynamics. Organizations that invest in developing and implementing a strategic data plan are fundamentally better prepared to anticipate, manage and capitalize on the increasing challenges and possibilities of data.
The Data Strategy
There are many things to consider in developing a data strategy such as the organization’s objectives, environment, clients, applications, infrastructure, management, security requirements, budgets and resources. However, strictly as it relates to data, the priority should be to first define the following parameters:
- The right retention duration for every type of data.
- Where more or less data access and control is required by stakeholders, such as CDOs, data scientists, data architects and analysts, application owners and users, IT personnel and executives.
- The data cost/performance requirements across all the workloads and users of the data.
- The capacity and performance scalability requirements for present-day and future applications.
As it relates to data retention, requirements can be difficult to determine and administer. Data captured today may become invaluable years from now, especially with accelerating advancements and widespread applications in analytics and machine learning. Therefore, developing a ‘data forever’ strategy where some data is indefinitely preserved may be required.
Identifying when specific stakeholders require greater access to and more control of the data is also a priority. In addition, it’s also important to define the ingest and egress performance, capacity and access requirements over time for the many types of data in an organization to ensure data storage architectures meet both today’s and tomorrow’s needs of the business.
The areas of the data strategy that require specific consideration include: (1) archival; (2) backup & recovery; (3) analytics; and (4) security.
Archives may be needed for audits, legal inquiries, long-term digital asset preservation, future use and mining, machine learning, monetization and more. Key considerations in archive are the cost of storage, preserving the bit-level integrity of the data, and ensuring that the data is both secure and accessible over many years of preservation. Many companies are moving away from using tape for archives to overcome its data durability and access limitations. Scalable disk-based archive systems should be considered to ensure the highest levels of ongoing data integrity and access performance for archive data.
Backup & Recovery
The need to restore data can arise for many reasons including unexpected infrastructure and site outages, unintended deletions, and the increase in malware that can quickly render critical data unreadable. Malicious ransomware, in particular, can suddenly present an immediate threat to an organization’s ability to produce and deliver products and services, bringing global enterprises to a standstill overnight. Recovery from such an attack often involves restoring hundreds or even thousands of servers in parallel, as quickly as possible. Disk-based backup systems located in proximity to server pools can significantly accelerate the restoration of data.
Extracting insight and value from data through algorithms and intelligent queries also needs to be a core part of the data strategy. The recent rate and scale at which companies can capture, store and analyze vast amounts of real-time data relating to all facets of their operations and customer activities has fundamentally transformed how organizations learn, adapt, improve and compete.
This digital transformation that combines big data and enterprise-scale analytics enables decisions to be made faster, with greater accuracy, and with real-time feedback on results. Data analytic strategies must align with how and where data is generated and captured, where it is processed and analyzed, and how it is distributed and stored across an enterprise.
As an example, the European Union (EU) General Data Protection Regulation (GDPR) becomes enforceable in May 2018, and with it, organizations can face significant compliance penalties. These types of data regulations can force organizations to understand their data privacy risks and take the appropriate measures to reduce the unauthorized disclosure of consumers’ private information. Data strategies must consider and deploy technologies, as well as secure processes, such as comprehensive threat detection and intrusion prevention, strict user authentication and access controls, data encryption, data immutability, and secure erase features, to build a comprehensive data secure environment.
On Premises, Off Premises or Hybrid?
Today’s organizations face choices and tradeoffs for where and how they deploy their applications and IT infrastructure, including the use of traditional IT, private cloud and public cloud models deployed either on- or off-premises in private data centers, co-location (colo) facilities, or distributed cloud data centers. Many organizations are choosing a hybrid approach that combines traditional IT resources with private and public cloud models.
Hybrid cloud models can provide fast and flexible development and deployment options with public cloud services for contemporary applications, while retaining the security features, management control and predictable economics of private infrastructure for certain workloads, applications and use cases. It also allows organizations to flexibly choose the best IT infrastructure for each element and aspect of their application workflows.
Data strategies in a hybrid cloud model must contemplate and balance several considerations in determining the best approach for each organization, for each type of data. And, though relatively new, their complexity is increasing as organizations expand into multiple clouds while also seeking to maintain control and address security and cost concerns. Interoperability between various clouds is also a concern as a wide variety of API interfaces are orchestrated and provisioned differently.
Other factors to consider include the prohibitive costs and complex licensing associated with significant data movement between cloud environments. Western Digital has developed a hybrid cloud corporate data strategy that mitigates many of these concerns by maintaining a complete copy of our data on-premises in scalable object storage systems, reducing costs and data movement, to and from the cloud, while ensuring a secure and durable archive, and a fast, local backup/recovery solution across our global enterprise.
A comprehensive data strategy is essential in determining an organization’s capabilities, agility and preparedness in managing risks, as well as its efficiency and competitiveness in rapidly changing markets. Aligning your data strategy with the objectives of the organization is key, with focus on your unique data retention, access and cost/capacity/performance requirements – both for today’s needs as well as scaling to meet future needs.
Attention should also be given to the core functions of a data strategy -- archive, recovery, analytics and security. With a data strategy defined and implemented, organizations can confidently answer the question ‘do you know where your data is?’