More and more organizations need to move large unstructured data sets across the world quickly and easily as part of their global media workflows or for big data analytics using Hadoop. But many are finding that traditional transfer methods often fail under the weight of today’s large data volumes and distributed networks.
Conventional methods for moving files over the Internet, like FTP and HTTP, are still the default means of transferring data, but are highly inefficient when it comes to moving large files in high latency, high bandwidth networks. And, as organizations with big data analytics initiatives look to new, shared storage options, one critical step is often forgotten. How will they move that data?
Unless data is being captured in the same place where it will be analyzed, companies need to move large volumes of unstructured data to readily accessible shared storage like HDFS (Hadoop distributed file system) or Amazon S3 before analysis. The majority of unstructured data is not generated where it can be analyzed; it’s created at distributed points and has to be transferred to a central location for analysis.
For example, to perform efficient image analysis of surveillance video using a Hadoop cluster, video captured at remote camera locations must first be transferred to shared storage accessible to the cluster. Given that each minute of HD video recorded at 50Mbps represents almost half a gigabyte of data, moving video or any other type of large unstructured data in a timely manner requires more advanced transfer technology than FTP or HTTP.
The arrival of SaaS brought large file transfer solutions to the world
File movement tools are now advanced enough to be able to handle the large file transfers that enterprises require. These solutions were initially adopted to transfer the huge movie and television files within the Media & Entertainment industry. And, until recently, these next-generation file acceleration technologies were mostly confined to the vast IT infrastructures of media enterprises like Disney and the BBC.
It wasn’t until the cloud revolution and the development of SaaS/PaaS (software-as-a-service/platform-as-a-service) solutions that file acceleration technology became relevant and accessible to industries outside media enterprises. The public cloud offers a virtually unlimited, elastic supply of compute power as well as networking and storage, giving companies ready access to big data analysis capabilities. Combined with the SaaS/PaaS model’s pay-for-use pricing, software becomes a utility service, where the organization can pay for what they use without having to own and manage its underlying infrastructure.
A solution that can move big-data-scale files efficiently
The faster organizations can move data for analysis, the faster they can free up the storage required to store the information at the collection points, cutting down on storage costs and management. And, if the company is after real-time analytics in order to gain a competitive advantage or if the organization offers analytics services, the company will see faster results and a greater the return on their investment.
Moving data to the cloud, whether it’s for analysis or storage, is sure to be a part of almost every company’s future, but especially those that are already data driven. Being educated on the different methods for moving data — old and new — and which ones best match a business’s strategy for big data or cloud storage is a step that shouldn’t be overlooked.
Ian Hamilton is CTO of Signiant, which specializes in intelligent file movement.