Big Data and Machine Learning Lessons From Google Photos
Several months ago, Google introduced Google Photos -- a breakthrough where Google helps people organize and access their photos at an unprecedented level. Big news for photographers, but the way Google Photos works also heralds a new age for Information Technology.
Machines That Can Learn
As an avid photographer, I’ve seen my photo library increase from the analog days (24 photos in a roll would be a lot) to now, when we snap hundreds of photos a day. The task of organizing all that information for easy retrieval has become more and more difficult. How difficult? Mylio estimates that there will be 1 Trillion photos taken in 2015.
At multi-million pixels each, this process embodies the definition of big data: a lot of unstructured information at an ever-increasing rate is being stored on our smart devices, our computers and more recently, the cloud. That doesn’t even account for videos, which compounds the problem exponentially.
As you load your photos onto Google, it applies machine learning algorithms to organize them based on various patterns it detects, including faces, locations, objects that appear in the images as well as proximity of events. Google can quickly recognize a photo of you and your family in front of the Golden Gate Bridge based on image recognition and photos taken by others at the same area. Try it – it’s fun. Upload a bunch of photos into Google Photos and search for things like oceans, cars, Paris, etc.
What was previously a tedious, highly manual task of organizing large quantities of photos into albums, events of interest, locations and other containers is now the machine’s job as it analyzes patterns in your imagery. By proposing searchable containers for your pictures, the algorithm does the modeling for you vs. the other way around.
In business, you can utilize similar methods that are easily transferable to your databases. Fields containing information such as SS numbers, tax IDs, address info, account balances and other pertinent customer information allow you to economize on your effort. The machine sorts and sifts through your information models and structures and lets you spend your time instead on establishing actions such as figuring out whom to contact, what to sell, and when. In fact, the machine would establish these recommendations for you where it detects patterns over a certain threshold of certainty.
The Google Photos process reflects a growing trend in big data called ‘Data First’ – properly understanding that your data starts with your data and not with structures.
Harnessing the machine to do the modeling allows you to cover a lot more ground than previous methods and greatly reduces the backlog of previously ‘unsorted’ data we all have to deal with.
Today’s data warehouses require modeling -- often the most critical task of organizing your data to optimize it for storage, ease of access and most importantly, coherence and accuracy. The modeling task is time consuming as it needs to be done correctly and often involves a laborious set of tasks such as requirements-gathering, data classification and information architecture sorting the data into the appropriate tables and containers.
Now, the evolution of traditional data warehouses is most clearly evidenced by machine learning’s ability to recognize patterns in your data without you teaching it. Instead, the model is created by the machine as part of scanning large quantities of data, sorting it into a proposed model with high certainty and leaving you with the small portion of data that has not passed a certain threshold of certainty. At that point, the machine asks you to make the final decision.
This is a major shift from the traditional route of modeling the data first, then loading it. Now, you can gain power and insight by loading the data first then modeling based on the content and meaning of the data, giving us more of our time back to better understand what we have and make faster, smarter decisions.
Of course, there’s been much talk about the promise of machine learning and there have been some specialized practices in that direction. Now, though, Google Photos is telling the world that you don’t need to teach the machine, you let the machine teach you. Sounds scary, but at the same time, it’s a very powerful shift that puts us more in control of our data and the future of our businesses.
Avi Kalderon heads up NewVantage Partners’ Big Data Fast Track program.