If you have been paying attention to recent advancements in machine learning (and deep learning in particular), you might have come across the notion of "transfer learning." Today, deep learning based approaches are allowing us to not only improve prediction accuracies across a wide range of tasks; they are also enabling us to tackle more complex analytical problems.
However, to train a deep learning model effectively, you typically need thousands to millions of labeled examples first. And building a training dataset of this size is no trivial task unless you happen to work at Google or Facebook. So is there any way out of this corner? Transfer learning may be an answer.
Transfer learning encapsulates the notion that a deep learning model trained to solve one task learns generalized features that can be exploited to solve another (different) task.
When using traditional machine learning models, it is very difficult to discover features that are generally useful for other tasks. But deep learning models are composed hierarchically of many features on top of features, and this turns out to orient the models to learn in ways that are useful for solving a variety of tasks.
To validate this concept, we ran a simple experiment to quantify the value of transfer learning in real terms, on three common tasks: sentiment analysis, topic classification, and image classification. Using what we call a Custom Collection API, we built bespoke models around the generalized features generated by a set of underlying deep learning models that have been pre-trained with a wide variety of knowledge about images and text.
For the purposes of our comparison, we trained two separate models, than made predictions using a separate dataset that was held-out during training.
Our control model was a well established machine learning model using features that are known to work well. For text, the features are essentially normalized word counts (TF-IDF: term frequency / inverse document frequency vectors). For images, we use HOG features (histogram of oriented gradients). These features were fed into a logistic regression model for training and prediction. Our test model used custom collection; we fed data, trained a model, and made a prediction using transfer learning for text and image analysis under the covers.
Below are the results across the three different tasks.
Given a bit of text, classify the text as having positive or negative tone. Dataset Analyzed: Large Movie Review Dataset Benchmarked Against tfidf vectors of samples (with stop words removed) into logistic regression with a grid search for an optimal regularization parameter.
Given a news article, assign each article to one of four different categories based on article content.
Dataset Analyzed: Aggregated News Benchmarked against: tfidf vectors of samples (with stop words removed) into logistic regression with a grid search for an optimal regularization parameter
Given an image and 25 possible categories, assign each image to the best fitting category.
Dataset Analyzed: Caltech101 Benchmarked Against: Logistic regression model trained on HoG features appended to a histogram of each color channel for each sample and with a grid search for an optimal regularization parameter
The results above show that the transfer learning approach using custom collection outperforms the traditional machine learning models by a large margin especially when training with small amounts of labeled data. The traditional models improve as we increase the amount of training data, but in all cases, the custom collections are equal or better.
The key takeaway here is that with only 100 training examples, the transfer learning approach beats a traditional machine learning model trained from scratch and gives generally good performance on each task. That's a big difference, because labeling 100 examples is something an analyst or expert can do inside of an hour.
Transfer learning opens up a practical way for users to quickly leverage deep learning techniques and build better performing models. So yes, sometimes, it can be possible to have your cake and eat it too!
(About the author: Vishal Daga is chief customer officer at indico.io, a start up in the machine learning and artificial intelligence space. In his role, Vishal works closely with indico’s enterprise customers to help them extract meaningful insight from unstructured data in applications such as sentiment analysis, social media monitoring, content filtering, content classification, recommendations, and personalization. Prior to indico, Vishal was in the data analytics space with IBM and Netezza.)
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access