Continue in 2 seconds

Data Movement in the Internet Age

Published
  • April 01 1999, 1:00am EST

Once upon a time, the holy grail of computing was the integrated database. As companies implemented one application after another, they also rode the successive waves of data management technology: file management systems, then pointer-based hierarchical and network (e.g., codasyl) DBMSs and then relational DBMSs. For years, the thought was that "someday" the company would standardize on a modern database architecture, create an integrated centralized database, eliminate data redundancy and reap many benefits in the process.

And like the holy grail story portrayed in Indiana Jones and the Last Crusade, it was important to "choose wisely." If the right DBMS was selected and the right data model was implemented, all would be well.

Of course, it never happened. Fueled by the rise of client/server and distributed platforms, and reinforced by the fact that highly tuned legacy applications are almost never rewritten, IT organizations have presided over a proliferation of databases. Instead of data integration, most organizations have data extraction and data replication. Different corporate databases contain overlapping data, and consistency is often achieved through data movement.

And now, the phenomenon of the Internet is taking data proliferation to new extremes.

Today, everyone is networked. Suppliers, partners, business units, resellers and customers are all interconnected by the backbone of the World Wide Web. The demand to push operational data out beyond the walls of the corporation has outstripped the ability of database and data movement technology to deliver.

Dramatic shifts in thinking have occurred. Instead of shielding corporate data from outsiders, businesses are pressured to distribute it (securely) to an ever-widening audience of suppliers, customers and employees. To support Web-based business initiatives, several kinds of data movement applications are needed:

1. One-Way Publishing. This involves moving data to a Web server for inquiry only and is often the first stage in a more comprehensive e-business strategy. Publishing data via the Web can enhance customer service levels while reducing traffic in overburdened telephone response departments. But predictably, when useful data is made available to customers, it isn't long before they want more sophisticated capabilities.

For instance, one mail order business put its catalog on its new Web site and immediately got traffic. But that traffic increased dramatically when the site was enhanced to allow customers to respond to mailings and interactively order merchandise. Not only was customer service enhanced (because customers no longer had to mail those pesky "if-you-don't-respond-by-a-certain-date-we-will-send-you-products-you- don't-want" return cards), the company was able to interact with customers more often and offer them more opportunities to buy.


Figure 1: Everyone is networked via the World Wide Web

2. Moving Data Out to Disconnected Web Users. Many organizations are going beyond data publishing to Web-based data movement. Not satisfied with merely presenting data through the Web browser interface, they are allowing employees, suppliers and customers to download appropriate data for use in local, disconnected applications.

Examples would be allowing a customer to download portions of product catalogs that correspond to what that customer typically orders, or a sales representative downloading profile information for the customers in his territory.

3. Synchronization. This, of course, is the next logical step. Once data has been distributed via the Web, it is possible for remote business processes to update it. Updates can occur outside the company walls (e.g., when a sales person takes orders from a customer and enters the data on a laptop computer) or inside (e.g., when a customer mails a reply card to a centralized order-processing department).

Modern business operates in a continuum that extends from supply chain partners, through the corporation and out to customers. The computing platforms and database technologies are highly diverse. To enable key business processes, certain corporate data is duplicated at these remote, heterogeneous and intermittently connected locations. When updates occur, databases become inconsistent. Data synchronization technology supports both the movement and the reconciliation of corporate data that is being distributed via the Web and updated in local business events.

Custodians of corporate data have historically struggled with the tension between centralized control and distributed processing. Just in the area of end-user access, different approaches evolved to circumvent the IT application development logjam and empower end-user decision making: information centers, decision support systems, data warehouses, data marts, etc. Often, these approaches allowed corporate data to be distributed in a read-only fashion ­ updates were still the province of centralized applications.

Databases began to proliferate dramatically when major departmental systems went to a client/server architecture. Still, data consistency could be managed if these systems were largely autonomous and connections between them and centralized legacy applications were few and well-defined. But the emergence of the Internet has changed everything.

Today enterprises are networked with each other and with their customers. Connections are asynchronous and often unpredictable. New business paradigms demand that IT implement a more far-reaching strategy for data sharing. Potentially large amounts of data must be tailored appropriately for recipients, moved expeditiously and kept current through cutting-edge refreshment strategies. What are some of the keys to a successful program for Web-enabling corporate data?

Know the Business Drivers

There must be a business reason for Web-enabling corporate data. And those reasons often mature over time: if you are seeking to give customers access to your product catalog, how long will it be before you also want to give them current inventory data, accept orders and provide feedback on order and account status? Will the application simply publish data, or will it also allow that data to be changed?

Identify Your Operational Expectations

This includes such things as knowing the quantity of data to be moved, the number of remote locations to which it is going and the bandwidth required to get the job done. Often, a large corporate database will be highly tuned to support the company's mainstay production systems. Rather than allow direct access via the Internet, which brings wildly unpredictable peaks and valleys in demand, data is instead moved to a Web database server. This raises the questions of how often the data should be refreshed and by what means will remote updates be pushed back into the central database?


Figure 2: Web-Based Data Movement

Define the Segmenting Criteria

For large enterprises with complex databases, the problem of segmenting corporate data is a major issue. Perhaps a Web-based application will publish data to hundreds of suppliers, but each supplier gets his own unique segment of the central database. Information pertinent to a supplier is scattered across dozens of interrelated tables in the corporate database. How can the database be sliced uniquely for each supplier? More importantly, how will updates to tables that may be far from the starting supplier table be recognized as belonging to the right supplier and distributed accordingly the next time that supplier is connected?

Deal with Data Model Issues

Complex corporate databases usually contain hundreds of interrelated tables. In seeking to Web-enable such data, IT organizations sometimes resort to "flattening" the data model ­ denormalizing the data by combining data from many tables and hence reducing the number of tables that are involved in the Web database. The hoped-for benefit is to overcome the problem of data segmentation: combining tables makes it easier to "know" which updates go with which suppliers.

This is usually a false economy, however. When databases are denormalized, significant problems emerge. Referential integrity can be compromised; data models are inconsistent between the Web and corporate databases; and serious breaches in consistency become possible. It is far better to implement a Web-enabling strategy with data models that are as consistent as possible.

Another serious problem with denormalized Web databases is seen when the corporate data model is changed or the rules for segmenting the data change. These changes are almost impossible to duplicate in a flattened database; and expensive restructuring and rewriting of data movement programs becomes necessary, forcing corporate business initiatives to wait.

As already mentioned, inter-networked technical environments are heterogeneous. To what degree must the strategy for Web-enabling corporate data account for this diversity? In sophisticated business initiatives, where suppliers are able to trigger updates to their partner's corporate data, heterogeneity must be planned for and managed.

In addition, the strategy for handling bidirectional updates must be developed, as well as the correct data movement strategy for refreshing remote data. Horror stories abound. In one case sales representatives were given catalog information for their laptops, but it took four hours to refresh the data. The result: all of the sales representatives downloaded the information exactly once ­ and never bothered with it again.

Most IT organizations know that the holy grail of a single integrated corporate database is a throwback to the days of the glass house and centralized computing. What is not as widely recognized is the fact that data movement is an important IT support function that needs to be well-planned and effectively architected.

With the advent of Web-based business initiatives, data is moving out further than ever before. Web-enabled corporate data ­ if properly leveraged ­ has the potential to pay significant returns to the enterprise. But to succeed, new applications must be supported by a data movement strategy that can accurately subset very complex databases, effectively manage referential integrity and data model issues, quickly refresh remote databases by sending only net changes and safely synchronize data that has been updated in more than one location.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access