Unless your organization is the smallest of small mom-and-pop shops, you’ve got multiple stores of data residing in numerous formal databases and in innumerable not-so-formal nooks and crannies across your enterprise.

In all likelihood, you’ve got gigabytes, terabytes or even larger volumes of various types of data scattered about - from mainframes to smartphones and other mobile devices - that are not serving your organization’s mission and business strategies as effectively as they could.

As data increasingly becomes a competitive asset that informs business strategies and execution, we’ve seen a trend emerging in which organizations are moving from a data center view of IT to a data-centric view of IT. The point is that an organization’s data, not the traditional centralized data center, should be the focus of IT activities and strategic planning. It is now more important to be able to make data available at the point of decision-making, rather than sequestering it in a single, fortress-like facility.

This trend requires IT to think and act beyond its traditional focus on storing and protecting enterprise data (which continue to be important), and to focus more on distributing data to the point of opportunity and action.

That’s why it’s critical to implement distributed data management systems. Distributed data management is the only way to ensure that the decisions you make and actions you take as a result are based on accurate, complete and consistently defined facts.

There is no one way to do distributed data management. Every organization is unique, and the strategy or strategies you adopt must be aligned with the particular business needs you’re addressing. With that in mind, consider the following typical scenarios.

Reporting

Executives and managers in every organization need to generate and analyze reports, both for record-keeping purposes and to use as the basis for important business decisions. They may need to view reports with varying frequency - at the start of the day, at the end of the day and periodically throughout the day. Creating these reports requires that someone access and assemble a great deal of data that may or may not all reside in the same place. To avoid negatively impacting operational databases, organizations characteristically offload the required report data into a reporting server, data mart or data warehouse. Depending on the specific end-user requirements, moving this data to a location that is local to report generators and consumers may require the use of batch loading, extract, transform and load and/or replication technologies, while making sure that the local data is synchronized with the same data residing in the organization’s master database.

Workload Distribution

With many individuals in many locations all needing access to particular data to do their jobs, it’s impractical and performance degrading to have them all querying the same centralized data store. Distributed data management can play a very valuable role in instances like these.

For example, in financial services organizations, individuals who write trading algorithms need to have local access to very specific data as they develop and validate those algorithms.

As I mentioned earlier, this is just one example of the importance of data availability at the points of opportunity and action rather than in some centralized server that must be queried - i.e., data-centric versus data-center focused.

Maintaining a Single Version of the Truth

While it’s important to be able to distribute data to the point of action, it’s equally important to ensure a single version of the truth across the enterprise. You simply cannot operate if different individuals or groups of individuals are working from their own versions of the data that can occur if there is no way to keep all of the data in synch.

There are different ways to maintain a single version of the truth. A master copy of the data can be maintained in a single repository. You can also consistently present that single version of the truth as a federated view by using a management approach of assembly from data residing on different platforms in different locations.

For instance, say you want to review your customer list and have access to their purchasing and payment histories. Customer data is routinely strewn across every enterprise, with different users maintaining the specific data they need to perform their jobs. As a senior manager, however, you might require all of that data - deduplicated, updated and transformed into the same format - to be readily accessible in one view. A federated data management approach can be employed to present the desired single version of the truth required in such an instance.

The point here is that it shouldn’t matter where data is located. Using distributed data management techniques, everyone can and must see the same version of the data because there can only be one version of the truth if individuals are to make the right business decisions at the right times.

High Availability and Disaster Recovery

A data-driven organization that suddenly finds itself without access to its enterprise data is for all intents and purposes out of business until access is restored. It might be a case of a server failure due to an overload, a data center going offline due to a natural disaster, a network interruption due to a construction mishap or some other outage due to human error. Whatever the cause, many businesses simply cannot function if employees, customers and business partners are cut off from data and key applications.

To protect themselves against such potentially devastating events, companies create remote failover sites or disaster recovery centers at which they install hardware and software that duplicate their primary systems. These may be identically mirrored sites or sites equipped with very similar hardware and software. The key is that the remote databases must be kept in synch with the primary database or databases. This, too, can be accomplished in a few ways using different techniques and technologies, including batch movement, asynchronous or synchronous database replication, storage replication or some combination of all of the above. Decisions on what technology and methods to employ depend on the criticality of the particular application one is backing up.

Additional Considerations

There are plenty of other scenarios we could consider, but these should suffice for our discussion. The key points are:   

  • Data is a strategic asset, the value from which is best derived when it can be delivered to the point of decision or action.
  • Given this, distributed data management competence becomes very important.
  • A critical element to consider when implementing distributed data management is the requirement that there must be a single version of the truth. There must be a clear understanding of how that will be achieved and maintained.
  • A variety of data movement and data synchronization technologies  and techniques are available to enable you to place accurate and current information where it needs to be for a broad range of user types.

Beyond the obvious need to move data around the enterprise with different rates of frequency and specificity, there are some overarching considerations that also must be taken into consideration.
Security is a must, both as the data is being moved, while at rest in repositories and as it is being queried, analyzed, updated and so forth. This falls under the larger issue of data governance. Policies must be developed and put in place to govern access to and the use of data based on a variety of parameters - some common sense and others unique to your business.

The scalability of your systems also comes into play considering the ever-growing population of data consumers. Distributing data to the points of consumption makes perfect sense from performance, availability and systems reliability points of view. However, data distribution must be managed. It cannot be allowed to self-propagate in a willy-nilly fashion. This requires a strategic approach to distributed data management.

One technology not yet mentioned, but that is critical to developing an effective distributed data management strategy is data modeling. Data modeling technology is the antidote to willy-nilly data propagation. It enables you to gather and analyze your data and metadata and determine the most effective ways to put that data to work for your organization. It’s key to developing an enterprise information architecture that identifies the set of requirements, principles and models needed to flexibly share and exchange information across your organization. That, in turn, is invaluable in helping you leverage your information assets to support your organization’s business strategies.

Begin With a Project

This may seem a bit overwhelming at first read, but it needn’t be so. The first step is to take stock of your data assets - where they are, how they are used and how they could be used more effectively.

Then identify a project or two to get started. Distributed data management is not something that has to be (or should be) done as a single, massive undertaking.

Involve IT and business users in the discussions about the projects to ensure you understand the requirements and the implications they have for the various technologies you’ll need to employ.

Remember, your data is much too important an asset to let it go unmanaged or under managed. Today, when business success requires pretty much instant access to data and applications, anywhere and anytime, a distributed data management system is a must-have. Take heart, though. Getting from where you are now to there is eminently doable.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access