The Harvard Business Review declared “data scientist” the sexiest job of the 21st century. After getting over my disappointment that “data modeler” was not chosen, I started thinking, What’s the difference between a data modeler and data scientist, anyway? I asked this question of the Data Design Challenges group and the following is a brief summary of the responses received.
Data modelers and data scientists both need to be excellent communicators -- during the discovery process as well as in the design of the final presentation medium. During the discovery process, communication takes the roles of detective and diplomat and often requires interacting with business and IT to elicit requirements. During the design of the final presentation medium, both data modeler and data scientist need to determine the ideal format for communication. The audience is an important factor in whether the data modeler will choose a conceptual, logical or physical data model, as well as in the notation used and symbol arrangement to make the model readable and, therefore, a useful communication tool. The data scientist similarly focuses on data visualization to make it easier for the audience to understand lots of complex data.
With this common bond of communication between data modeler and data scientist, we can now focus on what is different. The differences relate to role, math and certainty.
Role: Designers versus Consumers
Data modelers need to capture and precisely represent structure in the design of applications, which include the applications used by data scientists who, therefore, become the data consumers. Fakhrudin Jivanjee, data analyst, says, “A data modeler designs and creates the structure that will house the data, the data scientist works with the data sitting inside it.”
Asoka Diggs, enterprise architect, calls the data modeler the “entity-relationship modeler” and the data scientist the “analytic modeler” for this reason. He says: “The difference between an ER modeler and an analytic modeler is that the ER modeler is identifying and structuring data, with the outcome that the data is more understandable, more usable, of higher quality, and so forth. The analytic modeler is using that structured data, as well as unstructured data, to find patterns that can be used to increase the information and knowledge that can be derived from the data.”
Karen Lindokken, enterprise architect, further comments on the designer versus consumer role. “I think of the data scientist role as focused on data content and the information relationships that can be discovered from analysis of that content. Many of us data-centric types already gravitate between the roles. A data modeler would need to expand from understanding the structural requirements and business context, to understanding content and rigorously applying a variety of statistical analysis mechanisms to gain knowledge from that content in a context that is a far more macroeconomic than where the data modeler generally sits.”
Math: Set Theory versus Statistics and Operations Research
The underlying foundation for both data modeling and data science skills is mathematics. The data modeler uses set theory when modeling relational-based applications. The data scientist primarily uses statistics and operations research. Both Peter Heller and Thijs van der Feltz, information architect, mention this distinction. Says van der Feltz, “A data scientist needs to be a strong statistician, needs to highly skilled in query tools/SQL, and needs to be knowledgeable about some area of business (e.g., risk management) to know what to look for.”
Certainty: Known versus Unknown
A data modeler needs to know most if not all requirements in advance to complete the models. Can a Customer own more than one Account? What do you want to know about the Customer? How do you identify a Customer? Questions like these need to be answered and become part of the data modeling deliverable. Of course, the data modeler can always use abstraction when faced with the unknown. Terms like Party and Event allow flexibility beyond the known requirements.
Thijs van der Feltz says, “A data scientist is fairly comfortable with uncertainty and loose structures, while a data modeler is more comfortable in a structured world.” Susan Herrmann, associate director of business intelligence and financial solutions, similarly comments, “The skills of a data scientist extend beyond understanding the business relationships of data that are obvious. A data scientist must look for the not-so-obvious relationships by using statistics and predictive models to understand causal relationships.”
Madhu Sumkarpalli, business intelligence consultant, provides this great analogy: “Data modelers are walking on the trail which is somewhat determined and has a predictable destination. For example, data modelers (traditionally) modeled structured data and converted that data into information. Data scientists have to build the trail, they have to discover new paths, new ways and new meaning to the data and exploit the data in whichever form it comes in. Data scientists have to discover how the data can be useful and where it can be useful.”
In my undergraduate studies, I had an aptitude for statistics and in graduate school exceled and really enjoyed operations research. I remember asking my operations research professor for some career advice regarding what career opportunities exist for an operations researcher. He quickly replied, “Teach.” It is great to see statistics and operations research, two extremely powerful disciplines in mathematics, becoming mainstream.
I firmly believe that data modelers and data scientists have more in common than we have differences. We discussed communication earlier in this column, but at a higher level, both modelers and scientists use data to solve problems – we are great problem solvers. I also know that if you take a data modeler and put them in a graduate level course on statistics or operations research, he/she would do very well. I also think if you take a data scientist and explain set theory and the rules of normalization, he/she would easily come up to speed. I like thinking of us as just two different types of modelers – ER modelers and analytical modelers.
Over the coming months, I am going to work harder to bridge this gap between these two roles. To get started, I am going to facilitate an early morning session at Data Modeling Zone in Baltimore in October this year on what additional skills a data modeler needs to know to become a data scientist. Stay tuned!
If you’d like to join the more than 4,000 data modelers in the Data Design Challenges group, sign up at www.stevehoberman.com.