© 2019 SourceMedia. All rights reserved.

Greenplum Database

REVIEWER: Brian Dolan, director research analytics for Fox Interactive Media.

BACKGROUND: Fox Interactive Media (FIM) – a division of News Corp. – is an interactive services company dedicated to connecting, informing, entertaining and empowering consumers with the most compelling online media experiences. FIM’s popular Internet destinations – including MySpace, Photobucket, Fox Sports Interactive, IGN, Rotten Tomatoes and AskMen – cross social media and high-value content verticals and reach the largest global audience of any major media company.

PLATFORMS: FIM runs Greenplum Database on Sun Microsystems’ Sun FireX4500 storage server and the Solaris Operating System. FIM uses R data analysis language and SQL for queries, done directly within the Greenplum data warehouse.

PROBLEM SOLVED: FIM operates some of the highest-traffic Web sites in the world and serves more than 5 billion online ads per day. Each of these ads is optimized and targeted to specific audiences based on analysis of Web traffic, user behavior and click patterns. As part of our targeted ad serving platform, we analyze nearly 2,000 identifying variables for each of the millions of visitors to our sites every day. While our targeting process was already one of the most advanced in the industry, we were eager to improve ad click-through rates further by fine-tuning targeting, which meant we needed to analyze massive volumes of data to discover patterns and identify relevant targeting criteria across segments and demographics.

PRODUCT FUNCTIONALITY: With Greenplum Database, our data analytics team can now run hundreds of thousands of statistical tests against tens of billions of rows of data to help us better target ads. Because of FIM’s massive data volumes, it is not feasible to extract this data from the warehouse for analysis. Instead, Greenplum Database enables us to analyze data directly within the warehouse using standard analysis tools like R. We can now complete 10,000 experiments against 20 million site visitors in just three hours. Previously, it took an entire day just to extract the data and another whole day to run the tests. This faster, more efficient data analysis has resulted in up to 200 percent higher click-through rates.

STRENGTHS: Greenplum Database offers two main strengths: the ability to process huge quantities of data faster due to it massively parallel model, and cost-effective, almost infinite scalability. When data volumes grow, we can easily add another commodity server running Greenplum to handle the new data load – there is no expensive proprietary hardware and additional software licenses to buy.

WEAKNESSES: The ability to prioritize and balance queries at run time is a weakness. This would allow more effective sharing of resources between power users and reporting tasks. Second, the query optimizer is still immature.

SELECTION CRITERIA: FIM selected Greenplum for its massively parallel, infinitely scalable model – because we knew our data store will continue to grow exponentially over time. Greenplum was more cost-effective than traditional data warehouse solutions by many orders of magnitude, saving us a substantial amount of money. And, Greenplum allows us to perform data analysis directly within the data warehouse so we don’t have to export data to analyze it. Greenplum is enabling us to push the limits of large-scale data analysis.

DELIVERABLES: Our research analytics team uses Greenplum Database to conduct tens of thousands of real-time tests against millions of users every day, analyzing each visitor’s reaction to ads against more than 2,000 variables. This analysis is turned into reports for the BI, database administrators, systems engineering and product teams.

VENDOR SUPPORT: We’ve had nothing but positive interactions with the Greenplum team, from first sales call through to implementation and support. The technical team has been able to answer all of our questions, and we have the utmost respect for the engineering minds behind Greenplum Database.

DOCUMENTATION: The documentation was easy to understand, and Greenplum was a hands-on partner in helping us develop our particular implementation.

Greenplum Database


1900 South Norfolk Drive, Suite 224

San Mateo, CA 94403

(650) 286-8012


For reprint and licensing requests for this article, click here.