Exploring the Causes of Data Explosion
There are many sources of data that are contributing to this remarkable data growth. In addition to the vast amounts of data coming from mobile, Internet and traditional sources, millions of users are now evolving from information consumers to producers, creating their own data at any given moment. Therefore, companies need a way to effectively leverage information and blend data coming from all these sources and then provide intelligence around it. In order to aggregate and make sense of it all, a company’s data strategy needs to account for significantly larger and constantly growing volumes of information to eliminate data performance problems that can greatly impact business operations.
For example, a data performance bottleneck that prevents IT from meeting nightly load windows for a mission-critical application means the next day’s operational data is out of date, introducing risk and error into business decisions and potentially introducing unexpected costs to the business. For a major health care management organization, failure to deliver hundreds of thousands of reports containing daily claims to insurance companies can result in critical errors administering and delivering health benefits to individuals. Similarly, at this very moment, a leading online retailer must analyze hundreds of millions of records to derive critical information about customer preferences, online behavior and latest trends. Failure to do so can result in severe revenue losses and customer attrition.
Common Approaches to Cope with Data Explosion
Companies often turn to the traditional methods of adding more hardware, pushing data transformations elsewhere, such as down into the database, or custom coding when addressing data performance problems that arise as data volumes grow. Though these methods are common, they are typically not the best way to tackle big data and can actually hinder an organization’s ability to quickly adapt and respond to changing business demands.
For example, adding hardware may shorten the elapsed time for data processing tasks, but it is costly at all stages, including the initial implementation and ongoing maintenance costs. Moreover, hardware alone can no longer keep up with the data growth rate many organizations are experiencing today. Pushing all heavy transformations out of ETL platforms and into the database creates other problems for the organization such as the inability to maintain data lineage and hindered agility. In many cases, companies find that they cannot deliver reports in an effective manner or cope with new requests for information. Custom coding can quickly become riddled with problems given its complex upkeep, ongoing labor costs and manageability issues.
The Recommended Approach: Bringing the Focus Back to Data Integration
Since data warehouses are no longer economically or physically capable of managing big data within today’s commercial databases, new technology frameworks like Hadoop are emerging to track and manage these unprecedented volumes of data. Therefore, when devising a big data strategy, companies need to account for not only enterprise data, but also new sources of data, and then determine the best way to integrate the two for timely, accurate access to information as a basis for making business decisions.
Whether leveraging a data warehouse or Hadoop environment to manage big data, an important first step is to look at the effectiveness of the data integration function. In other words, is it successfully transforming data into value? If you find that improvements could be made, then determine where the data performance bottlenecks are likely to occur. Most likely, you will find delays occur during the complex sorts, joins and aggregations of this large volume of data. The best approach is to identify and target the top 20 percent of these processes in terms of elapsed time and complexity. These are the same jobs that cause 80 percent of the problems. Therefore, addressing them first can result in relatively quick and easy gains with huge benefits to the organization. To manage big data for the long-term as data volumes continue to grow, you will want to eliminate the need for tuning and look to create a fast, efficient, simple cost-effective data integration environment.
Making Big Data Work for You
In today’s 24x7 business world with demand for timely and relevant information, devising a forward-looking big data strategy is critical to ensure your organization can effectively leverage data from all available sources and quickly turn it into a competitive advantage. Reducing total cost of ownership through a data integration approach that is lean and scalable will allow big data to reach its full potential for your organization.
By planning for the future and keeping an eye on strategy, companies will see not only performance increases but also major business successes. With a sound strategy in place, big data can actually help provide the key to unlocking an organization’s next big opportunity.
Jorge A. Lopez, Senior Manager, Data Integration at Syncsort Incorporated, has more than a decade of experience in the Data Integration and Business Intelligence markets. He is based in Reston, Va. and can be reached at email@example.com.