Machine learning helps Pinterest maintain 'world's catalog of ideas'
San Francisco-based Pinterest touts itself as the “world’s catalog of ideas,” the foundation of which is photos and other visual images that are “pinned” by users. Founded in 2010, the company enables millions of consumers across the globe to find recipes, style inspirations and connect to communities of users with similar interests.
To make its offerings more personalized for users, the company has leveraged big data and search functions such as visual discovery tools to engage consumers and connect them with the information they’re searching for.
Following is a Q&A with Maura Lynch, product manager at Pinterest, who is helping to develop new tools that enable Pinterest home in on user preferences even as its libraries of content grows exponentially. Before assuming her current role, Lynch worked in analytics at Pinterest and in the gaming world for several years. She started her career in research in physics and Duke University and in economics at the Federal Reserve.
Information Management: A big part of how Pinterest engages with users is with suggestions of new “pins.” How does Pinterest determine what to suggest? How do new user recommendations work?
Maura Lynch: Recommendations are a core part of our discovery experience. We map the connections between Pins, people and boards into what we call the Taste Graph, and use these connections to recommend more than 10 billion Pins every day. We make recommendations based on a combination of your interests and related Pins in the Taste Graph.
If you're new to Pinterest, we ask you to tell us a few things you're interested in so we can start showing ideas you might be into. If you've been using Pinterest for a while, we have a lot of signals that help us make more relevant, personalized recommendations, and we can even help you discover things you may not have known about.
We try to understand which Pins are related to your interests and tastes. We use machine learning, computer vision and deep learning to process dozens data and signals such as text on a Pin, what's inside the Pin's image and what boards others have saved the Pin to.
Beyond understanding what a Pin is about, we also want to know how inspiring or interesting a Pin is to someone who likes that topic. Using machine learning, we make a prediction of how interesting or useful a Pin is based on how others reacted to it. For new Pinners especially, it's important for us to show only the best Pins we have about a topic, so we rely heavily on how others have interacted with these.
IM: One of the anchors of this effort is how Pinterest creates a “lightweight” signal. Can you define what a lightweight signal is, and how the company creates preferences based on that?
Lynch: The lightweight signal we use at Pinterest is a single question: "What kinds of ideas are you interested in?" It's a multiple choice question that allows Pinners to choose as many interests as they want. Once we know what you're interested in, broadly, we use our content understanding systems to show your ideas related to those topics.
Over time, as we gather more signals, we can make more granular recommendations for your personal taste. For example, if you're interested in cooking, we start by recommending recipes and cooking tips, and overtime can make more specific recommendations, like dairy-free or gluten-free recipes.
IM: Pinterest recently took recommendation to a higher level – it now has a visual search tool that allows users to pick a portion of a photo – an object in the photo that caught their eye -- then use that bit of phot to look for other related/similar things on the site. How does that work?
Lynch: In 2014, we built a computer vision pipeline and stack, and this became the foundation of products like visual search. Visual search enables people to pinpoint parts of an image and get visually similar results.
As the next evolution of visual search, we introduced real-time object detection last summer. This made visual search easier to use, and helped us grow our corpus of objects we recognize. Using the data we gained from visual search usage, combined with our visual search infrastructure, we launched Lens and Shop the Look last week.
Shop the Look uses computer vision and object detection to identify products in lifestyle images, such as shoes, bags, tables and lamps, and recommends products you can buy inside the Pin on Pinterest or directly from a brand.
Lens is a completely new way to search for ideas using your phone's camera. Just tap the Lens icon in the Pinterest app, point it at anything and Lens will return visually similar objects or related ideas. Lens combines our understanding of images and objects with our discovery technologies to offer Pinners a diverse set of results.
IM: With all the data Pinterest collects, where does it put it all? How does it manage it all?
Lynch: Pinterest runs on AWS, primarily in S3, but we also store data in places like HBase for faster access.
IM: How is it able to pull all this info together? What big data tech is used?
Lynch: We primarily use Hadoop.
IM: What’s the underlying analytics/machine learning software?
Lynch: We built custom analytics and machine learning software in-house.
IM: Pinterest is running a huge data shop. What have been some of the big challenges they faced in building these data stores/analytics capabilities and how did they overcome them?
Lynch: One of the biggest challenges we've been working at solving is making big data easily accessible to anyone at the company to foster data-driven decision making. This has been a non-trivial task, as our data has grown to over 100PB spanning thousands of tables.
Core to addressing these challenges has been requiring every data source to be accessible via a SQL interface that abstracts away the underlying system and schemas, allowing us to continuously evolve them without impacting the analysts and engineers who rely on these systems every day to make decisions and build products.