Nielsen Media Research, with operations headquartered in Dunedin, Florida, is the leading provider of television audience measurement and related services worldwide. Understandably, as a result of this research, data volumes at Nielsen Media Research are huge and represent the lifeblood of the organization.
Craig Silver, Senior Database Architect of the Data Technology Group at Nielsen Media Research, explains, "People buy and sell television advertising time based on these ratings." Used by everyone in the television industry -- the producers, the advertisers and everyone along the way -- Nielsen Media Research recognized the need for a consolidated data source that would support every national product they build.
Nielsen Media Research embarked on a project to build an audience data warehouse to meet their clients' needs in the best way possible. The short-term goal was to replace an existing system and application where ratings data was published monthly on CD-ROMs that were mailed to Nielsen clients. This new audience data warehouse is unique as it is an external data warehouse that enables Nielsen clients to access audience rating data through the Internet via a Web browser interface. Figure 1 illustrates the architecture of the Nielsen Media Research audience data warehouse system. To help validate architectural concepts developed by Nielsen that would incorporate this new Web-enabled delivery mechanism, Sun and Sybase built a "reference system" in the iForce Solution Center to test Nielsen's concept and workloads. This proof-of-concept led to the development of the Enterprise Data Warehouse Reference Architecture and provided Nielsen with valuable feedback that key elements of their ambitious new system would work. This allowed the development team at Nielsen Media to reduce the complexity, costs and risks of deploying new technology in their enterprise.
Figure 1: Audience Data Warehouse Architecture
The audience data warehouse required extreme flexibility because those accessing the data warehouse all have unique query and reporting requirements. Silver explains, "No one looks at things the same way here. The client at Turner may look at the data differently than an advertising agency." Turner would be interested in weekly or seasonal ratings for a particular show, while an advertising agency would want to know how a particular show fared during a specific advertising campaign.
Scott Springer, Senior Vice President of National Product Strategy & Delivery for Nielsen Media Research, states, "We view this audience data warehouse as a long-term solution for us, providing a single-source consistent database that many of our different reporting applications can use. It also gives us consistency and efficiency in data storage. Everyone gets his or her data from the same source. Our clients come into the audience data warehouse through the Internet, go through a browser interface and then go through our query engine to extract and recombine data to obtain their own intelligence."
Several years ago, Nielsen Media Research chose to convert much of their processing from a mainframe system to a UNIX two-tier client-server system powered by distributed Sun servers. Now, with the new application, Sun UNIX servers remain at the core, but the architecture is moving to n-tier. "The Sun UNIX environment gives us the different levels of scalability and flexibility that we could not get on the mainframe," states Springer. He adds, "Sun's servers are always competitive."
Kamal Nasser, Vice President of Information Technology for Nielsen Media Research National Services Group, states, "Data accuracy and consistency are obviously rated very high on our list of requirements." Comments Nasser, "We've built the system on multiple servers, a combination of Sun Fire V880s and Sun Fire 4800s. The new architecture is a true n-tier system." (See Figure 2.) "We leveraged the concepts of performance with J2EE and XML so that we have cleanly defined layers that represent the different functionality of the system." (See Figure 3.) "Within each layer are the formatter, the query engine, the scheduler, the job management and the presentation layer. All of these components -- the middle tier, the work tier -- are specialized. Taking these components and running them on these mid-sized Sun servers gives us tremendous value. Additionally, we get specialization. If the formatters' performance is not acceptable, we can add CPU and memory that is specific to them. It also allows us to create redundancy so that now we have two servers for formatters, two servers for the query engine and two servers for the middle tier. Any particular component could go down and an application would continue to run because wehave this redundancy. "
Figure 2: High Level N-Tired Architecture
Figure 3: N-Tired Fuctional Layers
The audience data warehouse was designed to allow Nielsen Media Research to build multiple applications on the same database. Additionally, the audience data warehouse provides expanded features for Nielsen clients. For example, data was previously provided on a monthly basis. The audience data warehouse provides it weekly and includes more data such as household characteristics. Nielsen was also able to change their old data mart approach to take advantage of the single data source. Nasser explains, "Previously we would build huge applications accessing specialized data marts. Now we're building smaller products accessing the data warehouse directly so that any new product or replacement product will point to the data warehouse."
According to Nasser, the requests that Nielsen clients run are not purchase orders but rather huge extractions of data for analysis. Each request may actually be made up of tens or hundreds of pages of SQL statements. "On the average, we see anywhere from 64 to hundreds of SQL statements within a report or analysis. Concurrency to us is a little bit different than the typical definition of concurrency because we have huge amounts of I/Os being processed," he states.
"That was the whole dilemma. With a traditional relational database, we would have been I/O bound," states Timothy Geary, Data Resource Manager of Information Technology for the National Product Group of Nielsen Media Research. "Now with Sun and Sybase IQ Multiplex, I look at our queries and we're pegging the box, CPU-wise. The max sustained rate we're doing is 600 gig/hour, which is not pushing the I/O of the SAN disk, but instead is limited by the number of CPUs we are using. We're not I/O bound. If we add more CPUs as the audience data warehouse grows, this thing will continue to scream."
Sybase IQ Multiplex (IQ-M) is the decision support system chosen by Nielsen Media Research for the audience data warehouse. Springer explains the choice, saying, "Sybase has put a lot of effort into making Sybase IQ a very robust system. They've focused this database very, very well for the types of extracts that we do. When you're trying to extract information in an intelligent fashion for decision support -- as opposed to transaction feeds -- you find that very few databases are tuned specifically for decision support purposes. Because we are a decision support shop -- not a transaction-heavy shop such as a bank or retail store -- we required support for incredibly sophisticated queries. Sybase IQ was the best choice for our requirements."
Another factor influencing Nielsen Media Research to select Sybase IQ-M involved its compression capabilities. Nasser explains, "Traditionally you have to massage the data, load the table and then build the index. With Sybase IQ, we're just formatting the data and building the index. That's helping with storage, that's helping with maintenance and that's helping with the loading. Obviously, the compression and the indexing structures of Sybase IQ are of great interest from a cost perspective. Right now the data in Sybase IQ is about 90 percent of the size of the raw data. Storage is one of our main cost items and while people say disk space is cheap, it is until you have to buy it in bulk. Unlike Sybase IQ, most DBMSs explode the raw data anywhere from 1.6 to 3 times original size, thereby increasing storage needs."
"The biggest advantage that we see from Sybase IQ," says Nasser, "is the loading. I would venture to say it's the fastest in the industry." Silver adds, "In the past, we were thrilled with Red Brick loads, but now with Sybase IQ, we are loading data eight times faster. We load almost 10 billion rows a month, a terabyte a quarter or 300 million rows a day. It just flies. We have a big enough backbone from Sun to handle that kind of data. Our mainframe, which feeds the data warehouse, can't produce the data fast enough. The audience data warehouse is something that we've been trying to do here since I started, but the technology wasn't up to it. Every report has to go through millions and millions of rows of data. Now with Sun servers and Sybase IQ, we're loading approximately 3.5 terabytes a year for this warehouse."
With Sun as the backbone, Nielsen Media Research has great plans for future uses of the audience data warehouse that is expected to be in the 12-terabyte range by mid-2003. "We plan to move many more applications onto this," states Springer. Nasser adds, "Our plan is to use this consolidated data source that encompasses more of the national data than previous sources to support just about every national product that we build. Although there is considerable initial cost, ultimately it's going to be a cost-effective solution as we'll be able to leverage multiple initiatives on multiple systems on distributed servers." Clearly the seamless growth enabled by The Enterprise Data Warehouse Reference Architecture will provide investment protection for years to come.
Geary, identified by his peers as one of the real champions of the audience data warehouse, sums up the team's efforts. He says, "This data warehouse is by far one of the largest I've ever seen and it's going to grow considerably every year. I am confident the hardware and technology we have selected will be able to keep up with our demands as we grow it to the 30-terabyte level."
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access