Organizations just can’t get enough of skilled data professionals, and that labor gap will only get worse, many industry analysts predict. That sober reality is fueling the rise of the so-called ‘citizen data scientist.’

Information Management spoke with Shawn Rogers, chief research officer at Dell Statistica, about who these people are, and how they fit into the firm’s product development and marketing strategies. Rogers started off by confirming that the topic of the citizen data scientist was a top-of-mind topic among attendees at the recent Strata & Hadoop World conference in San Jose, CA.


Information Management: What are the most common themes that you heard among conference attendees and how do those themes align with what you expected?

Shawn Rogers: The main theme that seems to come up in some form or fashion in just about every conversation I have about analytics today, including the ones I had at the Strata + Hadoop conference, is that of the citizen data scientist.

I don’t think it’s any secret at this point that the global need for traditional data scientists – those hard to find Ph.D.s with backgrounds in mathematics and statistics – is greatly outpacing the available supply.

There are only so many of those folks to go around and only so many companies that can really afford to invest in them even when they are available. But I don’t think it’s a one-way street. In other words, I don’t think the citizen data scientist movement is happening just because organizations are desperate.

That’s certainly a part of it, but I think that an equally important – if not more important – driving force is the eagerness of the citizen data scientists themselves. They may not hold a Ph.D., but these are really smart people, and they understand that using analytics to unlock data insight and predict future outcomes is quickly becoming vital to the success of the lines of business within which they work.

They want to drive that success and they know that in order to do so, they have to get more involved with analytics. So, I think the desire of the everyday business workers to make analytics are part of what they do is just as much a part of the citizen data scientist movement as the lack of true statisticians.

If you look at our most recent release of Statistica, as well as our short- and long-term product roadmaps and business strategy, it aligns directly to this theme. We’ve been focused and will continue to be focused on delivering capabilities aimed at empowering the citizen data scientist.


IM: What are the most common data challenges that attendees are facing?

SR: The one challenge that’s constantly coming up is the growth of IoT and resulting need for IoT analytics.

Organizations are fast coming to the realization that IoT implementations are only going to become more vast and more pervasive, and that as that happens, the traditional analytic model of pulling all data in to a centralized source such as a data warehouse or analytic sandbox is going to make less and less sense.

So, most of the conversations I’m having around IoT analytics today revolve around looking at how companies can flip that model on its head and figure out ways to push the analytics out to the edge. If you can run analytics at the edge, you not only can eliminate the time, bandwidth and expense required to transport the data, but you make it possible to take immediate action in response to the insight. You speed up and simplify the analytic process in a way that’s never been done before.


IM: What are the most surprising things that you heard from attendees regarding their data management initiatives?

SR: I wouldn’t call it a huge surprise, but I’m always taken back by how often I hear from companies that attempted to do too much, too soon with analytics.

There’s a real misconception out there that you apply analytics in order to uncover something completely new or to fix a process that’s completely and totally broken. That’s not to say you can’t apply analytics to achieve those types of outcomes, but those are massive undertakings, and if that’s where you’re starting with analytics, you might be looking at an uphill battle. Unfortunately, that’s exactly where a lot of companies are starting and they’re struggling as a result.

The better way to start with analytics, and the way I think organizations get the most immediate ROI, is by applying analytics to augment or enhance something that’s already working. In other words, find something that’s working and leverage analytics to make it work even better.

A great example of this is with manufacturing. If you’re a pharma company manufacturing drugs, obviously, you’ve got a process in place that’s already working and effective. But that doesn’t mean you can’t find ways to make it even better.

Maybe analytics helps you speed the process by which you validate your solutions in accordance with regulatory requirements. Or maybe it helps you predict problem areas and avoid a lost batch. Whatever the case, you’re not reinventing the manufacturing process, but you are adding tremendous value to the organization through the smart application of analytics.


IM: What does your company view as the top data issues or challenges in 2016?

SR: I think it’s pretty clear that empowering the citizen data scientist and addressing the explosive growth of IoT infrastructures are going to be the lynchpin issues that dominate the analytics space in 2016. Whether you’re an end-user organization trying to manage these challenges on a daily basis, or vendor trying to evolve your solutions to help your customers stay in front of them, pretty much every conversation will in one way or another tie back to one (or both) of these issues.


IM: How do these themes and challenges relate to your company’s market strategy this year?

SR: In a word – directly. Earlier this month, we announced the launch of Statistica 13.1, and if you look at the new functionality and capabilities in that release, you can in most cases draw a line directly back to our desire to either empower citizen data scientists or help companies address IoT analytics use cases. Let’s start with the latter issue first.

In version 13.1, by working in tandem with the team at Dell Boomi, we’ve given users the ability to deploy “analytic atoms” on any edge device or gateway, including the Dell Edge Gateway 5000 Series, anywhere in the world. This edge scoring capability enables organizations to address nearly any IoT analytics use case by running analytic workflows directly at the edge of the network where data is created. So, that’s huge step for both us and our customers, and we’re going to continue pushing that functionality forward in subsequent releases.

Then you have a slew of new capabilities designed to meet the specific needs of the citizen data scientist. We’ve added new data preparation functionality built specifically for the citizen data scientist, aimed at simplifying the preparation of structured and unstructured data.

Used in tandem with Statistica’s Reusable Process Templates, it’s now easier than ever for users to share and distribute analytic workflows with non-technical users. What this means is that traditional data scientists can build analytic models and workflows once, and non-technical business analysts can reuse those workflow templates repeatedly within the organization. This makes it possible for business users to more efficiently use analytics to solve real business problems without the technical expertise traditionally required.

This is one of our richest releases ever, but at the risk of sounding cliché, it’s really just the beginning a full-fledged effort to meet the needs to the non-technical user. You’ll always have your core statisticians, and we’ll always serve that group to the fullest, but analytics is and always has been about the business, and business users will increasingly become the driving force behind analytics initiatives.

So, we’ve got a full roadmap of initiatives planned for 2016 geared around the concept of collective intelligence – that is, making it easier than ever for non-technical users to leverage existing analytic expertise. It’s a really exciting time for Statistica and for the analytics community at large.