Jacob Spoelstra is global head of R&D, Opera Solutions, the independent analytics provider that boasts approximately 200 data scientists, one of the largest pools of that emerging practice outside of IBM. Spoelstra, a native of South Africa with a machine learning Ph.D. from the University of Southern California, develops internal analytics tech and strategies for Opera as well as customized, proof-of-concept projects for clients. His decade in the field also includes analytics work at SAS and Fair Isaac. From his San Diego office, Spoelstra recently discussed his thoughts on getting data scientists involved early in business meetings and continually learning about data needs, as well as different facets of his role over the last decade at Opera.
There are different definitions of data scientist floating around. How do you define what you do and what your team members do at Opera?
We typically call ourselves analytics specialists or scientists. I think it’s a person who can practically deal with data, who understands how to take data and can take on the basic manipulations of the data in databases and management tools. Then, you can program and apply the statistical learning and modeling techniques that we use for extracting information from that data. Basically, it’s someone who can go all the way from taking the raw data to analyzing and transitioning data based on its properties and their capabilities with machine learning to embolden a good, working business model.
From your position, then, how does interaction with the business-side factor in to research and development at Opera, and with clients in your customized analytics projects?
Well, we definitely lean toward finding business input. The data scientists are positioned as an integral part of the team [at Opera] and in projects with our business customers. Our projects involve some level of customization for our clients, so our data scientists are part of the business meetings from early on in the process and keep regular contact with their business point-person throughout [the project]. It’s important to have a sense of the business problem that’s being solved to avoid just solving some abstract problem with the data. To get there, you have to be very aware of what the unique business problems are.
As big data excitement has increased over the last year or two, do you see it trickling down into industry verticals and really finding traction with the needs for different industry niches?
The main thing, and it’s been coming for a few years, is that there’s more acceptance in the business world for using advanced analytics and machine learning to solve business problems. It’s the realization that there is lots of data, but you have to use advanced algorithms applied to that data to find any deep business value. It takes expertise to, first of all, deal with the big data. Managing it is such a small part of the problem. The real challenge is finding that value in the data, and our part as data scientists is extracting meaning in the data, which takes finding the parts of the data that are relevant to your business, then designing algorithms to extract that relevance. That’s where the machine learning element comes in, the importance of statistical modeling.
Do you think that as these systems get better, more people may opt for that as-a-service avenue? Or do the IT teams have to get smarter on the business side to make sure their roles in the enterprise remain vital?
What we’re quite interested in these days is the idea of human-plus-machine. When you’re dealing with high-volume data problems, the data itself becomes too much for you to analyze. But you want to integrate the human side of it to give perspective on decisions, either through using machine learning to prioritize and bring issues down to scale, or through analytics as a service where you feed back the intelligence to the business so that they find value more simply. The question, then, isn’t ‘Let’s make big data all as-a-service or in house.’ It’s a bit of both. Everyone is looking at better user interfaces, bolder visualizations, essentially easier ways for people to interact with the data and ask questions of the data. To that extent, [SaaS] can give the business user an easier way to make sense of the data without needing to be at that data scientist user level.
So, as your practice continues to emerge and grow in enterprise relevance, what’s the new challenge? How are you balancing those increased data loads with increased business expectations?
From the data scientist point of view, it’s this concept of making machines smarter in the product environment. It’s an area we focus on at Opera, the idea that you have real machine learning, as opposed to only predictive analytics. True machine learning means that you’re in the loop and … continually learning and adding to what’s happening in your business process. It’s something that’s becoming increasingly important, and still very hard to solve in a general way.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access