The development of Hadoop and the Hadoop Distributed File System has made it possible to load and process large files of data in a highly scalable, fault tolerant environment. The data loaded into the HDFS can be queried using a batch process provided by MapReduce and other cluster computing frameworks, which will parallelize jobs for developers by distributing processing to the data located on a pool of servers that can be easily scaled.

The Hadoop environment makes it easy to load data into the HDFS without needing to define the structure of the data beforehand. The usage of the Hadoop environment naturally raises the question:

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access