One of the top challenges for business intelligence (BI) and analytics professionals in a cloudy, big data world is data movement and integration. The growing complexity of reporting across large volumes of heterogeneous data stored in different environments is daunting because of the principles of data gravity. Even with fast networks and improved caching, architects need to design solutions with distance, bandwidth, throughput, and latency performance considerations in mind.

Data gravity introduces significant industry challenges. BI has primarily lived on-premises, with only a minuscule 2 percent of BI applications living in the cloud. Even as the industry shifts more and more apps rapidly to the cloud, data warehouses and many other data sources still often reside on-premises for a long time. Thus, we anticipate an increased need for BI apps to query across both realms, on-premises and cloud, as the latter matures.

What is Data Gravity?

Data gravity is an undeniable market force that we’re seeing in our BI industry mid-life crisis. The mobile- and cloud-first world – one in which a myriad of apps for every conceivable function generate more data in the cloud than on-premises. As more apps are delivered via mobile, cloud and Software as a Service (SaaS), the center of data gravity is already shifting.

Last year at the Gartner BI Summit, data gravity came up in several sessions. After doing a bit of research on the concept, Dave McCrory’s blogs were the most enlightening and educational on this topic. He has even developed a few clever formulas for calculating data gravity, data physics models, application mass and other related areas.

 “Consider Data as if it were a Planet or other object with sufficient mass. As Data accumulates (builds mass) there is a greater likelihood that additional Services and Applications will be attracted to this data. This is the same effect Gravity has on objects around a planet. As the mass or density increases, so does the strength of gravitational pull. As things get closer to the mass, they accelerate toward the mass at an increasingly faster velocity.


How does one defy data gravity? You can only shuffle data around so quickly. Even with the fastest networks, you are still bound by distance, bandwidth, and latency. All of these are bound by time, which brings us to speed of light. You can only transfer so much data across the distance of your network. In a perfect world, the speed of light becomes the limitation. At some point, it becomes impossible to move an app, service, or workload outside of the boundaries of its present location.” – Dave McCrory

Data Gravity Impacts on BI

BI and analytics pros cannot ignore the value of data in the cloud. The gorgeous dashboards they create typically are not at the dead end of a one-way street. Most analytics projects are highly iterative in nature. Dashboards enlighten a decision maker that in turn sends action items for adjustment. The business process being monitored by a dashboard is continuously tuned for optimal performance at various points from various data sources. To effectively deliver iterative intelligence, data and added context flows rapidly back and forth between apps, data sources and the analytical assets regardless of where they live…on-premises or in the cloud.

In the past, teams have extracted and downloaded Google Analytics and Salesforce cloud data into a client’s on-premises data warehouse. Today, though, many more line of business apps are in the cloud, including Marketo, Dynamics and Workday.

The decisions to copy or move cloud data for analytical purposes are becoming more complicated as cloud data volumes grow.

According to Kevin Petrie, senior director of product marketing at Attunity, “In a variety of use cases today, data center managers are in the process of migrating workloads to the cloud to take advantage of its unique benefits including elasticity and cost advantages. The cloud is particularly advantageous for experimental and/or fast-changing analytics initiatives. Wherever your data lives, you should be able to extract value from it. That’s where hybrid BI capabilities are becoming essential for BI pros to understand and utilize where it makes sense to do so.”

Data gravity pain

The inevitable need to move data closer to cloud BI solutions is why you see freemium or low cost Microsoft Power BI, Amazon Quicksight and Google Data Studio disrupting the market. Essentially, the loss of on-premises and cloud BI app revenue is offset by selling much more profitable cloud data warehouses like Microsoft Azure Data Warehouse, Amazon Redshift and Google BigQuery.

Creative data movement offerings like Amazon Snowball are great for one-time data transfers, but they do not fulfill ongoing, diverse requirements. High-performance data replication is often the best method of defying the principles of data gravity while also providing flexibility for handling a wide variety of analytics scenarios and data sources.

Data gravity solutions

To avoid data gravity pain, businesses should explore solutions that automate, move and transform data across many different data sources and cloud environments. Such solutions should work seamlessly across heterogeneous cloud environments and data sources to simplify complex data integration. For BI and analytics professionals, the solution must be ideal for data warehouse/ETL automation, change data capture (CDC), and replication in a hybrid BI architecture.

Solutions should fulfill an extensive range of data integration requirements for data distribution, migration, query offloading for reporting and real-time business analytics on premises or in the cloud. Next-generation change data capture (CDC) technology and intelligent in-memory transaction streaming significantly improve replicated data delivery times and data movement efficiency.

Businesses will want a solution that is simple to deploy yet offers secure, scalable, performant replication between mixed data sources regardless of location.

Hybrid data source reporting is a common requirement today. Despite the advances in remote data source connectivity and querying technologies, the physics of data gravity alone dictate a continued need for flexible, diverse data movement options. The process to securely transfer data efficiently between different cloud providers, data source types and sizes is quite easy to accomplish with the right BI solution for easing heterogeneous data integration pains and rapidly delivering data for real-time analytics.

(About the author: Jen Underwood is founder of Impact Analytix)