For Netflix, the DVD delivery and video streaming company, gaining insight into how customers consume and interact with its offerings is the lifeblood of the business. And with 50 million customers in 40 different countries—and about 10 petabytes of data in its warehouse—it has plenty to draw on to find threads of information that could lead to operational efficiencies, new products and better service.

Today, the company collects and analyzes data on everything from audience viewer preferences, to the technical quality of content, to repair service effectiveness. From its data analysis, Netflix knows, for example, that lighter users view more movies than TV shows, while heavier viewers tend to watch more TV shows than movies. Also, it can tell if users stop viewing a show or movie because of audio failures or out-of-sync subtitles. All the while, it can predict which programs users will enjoy based on feedback and reviews.

Netflix launched in 1997. As it grew, company leaders knew their DVD rental operation was collecting tons of data on its customers and the movies they liked to watch – and they know that data could provide deep, rich business insights. However, in the early days, management felt constrained by the company’s traditional data-center infrastructure. “We couldn’t just throwmore hardware at it and go,” says Kurt Brown, director of data platform for Netflix. “We had to redistribute the data and do a lot of heavy lifting to do the next step-function.”

Also See: How Big Data Helps the Tiniest Patients

Those considerations coincided with Netflix’s shift to video streaming. In February 2007, after nearly 10 year in business, Netflix delivered its billionth DVD. At the time, the company was comfortable with having movies available to 99.9% of the time, as a short delay in delivery wasn’t too problematic for customers, chief product officer Neil Hunt says.

But not long after Netflix shifted its focus to streaming media, availability took on a new meaning. Viewers had to be able to access shows and movies in realtime, and, according to Hunt, company leaders realized they needed to get closer to 99.99%.

The shift to an online business model was nothing short of seismic and necessitated a re-architecture of Netflix technology background. The company decided to make, what at the time was, a bold move to the cloud. That decision offered the opportunity to move to a new set of technologies that would strengthen its analytics capabilities and position the company to capitalize on its big data.

Producing Better Data

The cloud migration led Netflex to a technology stack anchored by Amazon Web Services (AWS), Hadoop data processing technologies, and Amazon Simple Storage Service (S3), a component of AWS.

Netflix moved from relational databases to Not Only SQL, or NoSQL, primarily using Hadoop and Cassandra databases, according to Hunt. Netflix also uses Teradata and Redshift databases, which aren’t NoSQL but are big-data analytics databases, Hunt says.

Netflix has two primary Hadoop clusters for heavy-duty analytics, or jobs in the scale of billions of events, and another cluster for ad-hoc queries. The company also uses three “bonus” clusters every night to expedite nightly batch processing. It also has about a dozen smaller test clusters that can be used at any time.  

Amazon’s S3 serves as a central data hub. Netflix can put petabytes upon petabytes of data into it, according to data platform director Brown, and then directly access it or move the data to a system that’s more suited to a particular project. In some cases, Netflix uses reporting tools from MicroStrategy in a Teradata or Amazon Redshift data warehouse, in which case S3 is used as a “staging ground” to store the massive amounts of data and then to push out subsets needed for analyses, Brown says. 

Netflix also switched from a consolidated, monolithic application system housed in the data center to service-oriented architecture in the cloud. There, the company keeps hundreds of micro-services that run independently of each other. By isolating those components, saysYury Izrailevsky, vice president of cloud computing and platform engineering, Netflix is able to innovate more quickly without the need for teams to coordinate in a centralized fashion.

For reporting, Netflix uses a combination of dashboards and Microstrategy It also uses  Solarwind’s Ignite database performance monitoring tool, which allows Netflix  to review aggregated data. For example, employees can see how many users watched for sustained periods of time or made decisions like renewal, cancelation, rated a show or movie, or suggested one to a friend, according to Hunt. Ignite allows product managers to slice and dice customer data based on region, method of payment and other characteristics to help determine which product tests should be rolled out or need improvement.

In fact, Hunt says, its sophisticated big data and analytics platform give Netflix insight into five key areas:

For customer service, the platform helps the company see what, when and how much customers view, and what behaviors mesh with overall customer satisfaction.

For product development, Netflix learns when customers visit the service, what devices they engage with, which features are used, among others.

For content development, Netflex can see what titles are most popular, which shows or movies tie to higher retention, and both successful and failed searches.

For technical quality and content delivery, Netflix can determine if it should use extra bandwidth to drive better steaming quality or lower risk of re-buffering.

And for maintenance service, Netflix can look at interaction satisfaction scores to determine what issues to fix. 

Looking back, Netflix’s expectation of richer data proved to be spot on. Netflix big data implementation is delivering tangible results.

Over the last five years, Netflix customer usage has increased by two orders of magnitude, Izrailevsky says, while data volumes have increased several thousands of times.

And, early last month, Netflix’s share price hit an all-time high.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access