December 2, 2008 Syncsort and Vertica in collaboration with HP set a new world record for loading data into a relational database for business intelligence (BI) applications powered by HP BladeSystem c-Class.
This new world record shatters the previous extract, transformation and load (ETL) performance benchmark. Syncsort's data integration product, DMExpress v4.8 extracted, transformed, cleansed and loaded 5.4 terabytes of raw data into the Vertica Analytic Database in 57 minutes 21.51 seconds, using highly cost-efficient HP BladeSystem c-Class running the RedHat Linux operating system. The data was generated using the data generation tool of the industry-standard TPC-H benchmark.
The Syncsort/Vertica solution was run on two HP BladeSystem c7000 enclosures using a combination of server blades and storage blades. The efficiency of the software combined with the modularity and manageability of HP BladeSystem c-Class offered a high level of control, adaptive flexibility and performance. The density and flexibility of HP BladeSystem enabled the entire 16 node shared-nothing cluster, including all disk storage, to fit in less than a half rack (20U) with elevated energy efficiency.
The breaking of the ETL World Record by Syncsort and Vertica on HP Bladesystem c-Class is also the first to be achieved using a columnar database.
Until now, column-oriented databases have been seen as having a disadvantage in data loading speeds, over traditional row-oriented databases, said David Menninger, vice president of marketing and product management of Vertica. Next-generation database technology turned that paradigm on its ear, radically accelerating query performance against terabytes of data, without making data loading the weak link in the chain. We designed the Vertica Analytic Database and partnered with Syncsort to deliver the fastest possible performance across the deployment cycle
The performance and scalability of DMExpress v4.8 and Vertica v2.5 was tested using 16 two quad-core HP ProLiant BL460c server blades and 16 HP StorageWorks SB40c storage blades. The data generation tool (DBGEN) of industry-standard TPC-H benchmark was used to generate the source data which represents business data that contains a variety of data types.
The benchmark results were independently audited and verified by George Spofford of DSS Labs.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access