Who are you, citizen data scientist?

Register now

Ugh. Everyone is talking about the citizen data scientist, but no one can define it (perhaps they know one when they see one).

Here goes — the simplest definition of a citizen data scientist is: non-data scientist. That’s not a pejorative; it just means that citizen data scientists nobly desire to do data science but are not formally schooled in all the ins and outs of the data science life cycle.

For example, a citizen data scientist may be quite savvy about what enterprise data is likely to be important to create a model but may not know the difference between GBM, random forester, and SVM. Those algorithms are data scientist geek-speak to many of them. The citizen data scientist’s job is not data science; rather, they use it as a tool to get their job done.

Here is my definition of the enterprise citizen data scientist: A businessperson who aspires to use data science techniques such as machine learning to discover new insights and create predictive models to improve business outcomes.

Citizen Data Scientists Are A Hearty Lot

They must be dedicated to their part-time craft, because doing data science is not easy. It requires learning the life cycle: data acquisition, data preparation, feature engineering, algorithm selection, model training, model evaluation, and, finally, insights and/or predictions. They may even have to learn to program in R or Python. If they are lucky (and smart), they will download RapidMiner, KNIME, or others, because these tools provide nice visual drag-and-drop interfaces versus harsh coding.

Good News For Everyone That Deals With The Gnarly

The best news for citizen data scientists is that many of the gnarliest aspects of the data science life cycle are being abstracted by automated machine learning solutions (AutoML). Automated machine learning solutions such as DataRobot, H2O.ai’s Driverless AI, Google Cloud AutoML, and more provide sophisticated tools that abstract the gory details of data science so that citizen data scientists and perhaps mere mortals can analyze data and build robust machine learning models.

It’s also good news for data scientists because the same automation of the data science life cycle can make data scientists more productive. And it’s good news for business because the demand for machine learning is reaching voraciousness-level.

(This post originally appeared on the Forrester Research blog, which can be viewed here).

For reprint and licensing requests for this article, click here.