NOV 3, 2009 5:23am ET

Related Links

Biting the Bullet for a Core Upgrade
February 6, 2012
PaaS Matures, But With Doubts
February 3, 2012
The CRM Shift
February 3, 2012

Web Seminars

Getting Started with Big Data
Available On Demand
Smarter Backup Strategies
Available On Demand

Moving Data to Enterprise Clouds

Print
Reprints
Email

Data clouds pose an interesting dilemma to enterprise IT organiza-tions. On one hand, they promise to drastically reduce the cost and complexity of storing enterprise data. On the other hand, they create numerous migration challenges. When considering a data cloud implementation, enterprises currently have two primary options: they can deploy an internal data cloud or they can rely on an existing third-party, public data cloud like Amazon Simple Storage Service (Amazon S3) or Rackspace Cloud. While some of the fundamental challenges like appropriate security and governance potentially exist in both deployment scenarios, a deployment in a public cloud has an additional and critical limitation - moving a large amount of data into a public cloud can take months or years because of the constraints imposed by insufficient network bandwidth. Werner Vogels, CTO, Amazon.com, describes this issue in his blog, “All Things Distributed.” Vogels contrasts the number of days it would take to siphon a set of data to Amazon using different network bandwidths. See Figure 1 on page 15 for a partial set of his findings.

In other words, transferring one terabyte of enterprise data to a public cloud using typical public network speeds of T1 to 10 Mbps takes between 13 and 82 days. Considering that most large enterprises have data volumes reaching several petabytes, one might conclude that utilizing a public cloud is not as practical as initially predicted.

Migrating Data into the Cloud

While some companies have standardized on a small number of data sources, the majority of medium and large-sized enterprises use a wide variety of relational, nonrelational and packaged application data sources (see Figure 2). This variety of data sources typically reaches into the hundreds, with new ones continuously added on a monthly basis. In addition, the average enterprise uses different data source versions (some companies have reportedly deployed up to three different versions of the same product from a single vendor) and a broad variety of data source types. The result is that migrating all the enterprise data into a cloud in a meaningful way will take Goliath-like efforts.

There are two popular approaches for effectively migrating enterprise data into the cloud. The first is to batch-load from the data sources directly into the cloud. To ascertain if this is the best choice, the IT organization should determine if the data stored in the cloud will be shared among many applications or if it will be compartmentalized for use by individual applications. If multiple applications use the same data sets predominantly for read-only purposes, then sharing these data sets in the cloud is likely to be safe. However, because many enterprises frequently copy original data into multiple locations to either increase performance of the local applications or to combine it with other data per business users’ needs, the better path might be for IT to create a consolidated model that can be ported into the cloud. This consolidated model is created by data discovery and analysis that identifies all copies and permutations of the data. On the other hand, if multiple applications perform updates and writes into the data source, then compartmentalizing the data set for exclusive use by individual applications is probably going to be the only viable option (see Figure 3).

Once IT identifies the data to be migrated into the cloud, it then develops extract, transform and load batch and transformation scripts to be executed to migrate the data into the cloud. This process can take several hours to several days, depending on the volume of data that needs to be moved. Because the data in the originating sources typically continues to be updated throughout the migration process, the IT organization will also need to develop scripts that synchronize changes and are executed on a periodic basis until the enterprise applications are “completely switched” to work against the cloud data. All in all, this is a time- and resource-intensive process throughout its entire cycle.

The second option for data migration into the cloud is data virtualization. Data virtualization offers several key advantages over manual batch loading. First, data virtualization fully abstracts the data from the sources and the accessing applications. Thus, a data model that is put in place for the data virtualization layer can also serve as the initial data cloud model for the particular data sets abstracted. Second, instead of batch loading the entire data set, data virtualization allows the IT organization to load data into the cloud on demand. The IT organization accomplishes this by configuring the data cloud to use the data virtualization layer as a single data source. Third, data virtualization removes the complexities associated with continuous changes to the data sets by allowing a phased migration with some of the enterprise applications continuing to access the data through the virtualized layer, and others accessing the data in the cloud - the changes made in the cloud are automatically synchronized back into the originating data sources using data virtualization middleware’s pass-through capabilities (see Figure 4).

Filed under:

Advertisement

Comments (0)

Be the first to comment on this post using the section below.

Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.
Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.