My instincts tell me there is no contest. The term data scientist conjures up an image of a tense, driven individual, surrounded by complex technology in a laboratory somewhere, wrestling valuable secrets out of the strange substance called data. By contrast, the term data philosopher brings to mind a pipe-smoking elderly gentleman sitting in a winged chair in some dusty recess of academia where he occasionally engages in meaningless word games with like-minded individuals.
These stereotypes are obviously crude, but they are probably what would come into the minds of most executive managers. Yet how true are they? I submit that there is a strong case that data management is much more like applied philosophy than it is like applied science.
Making Distinctions
I have argued before that a science of data is possible, but by “science” I mean an organized body or knowledge that addresses a particular set of problems using a particular set of methods. Today, science is really a shorthand reference to “natural science” which is the set of sciences that investigate the material world (meaning physics, chemistry, biology, etc.). The success of natural science over the past five centuries has been indisputably revolutionary.
However, various authors, such as F.A. Hayek and R.G. Collingwood, have argued that this has given rise to scientism (this is Hayek's word - Collingwood's term was pseudo-science). Scientism is the misapplication of the language and methods of natural science to departments of human experience which utterly unlike the material world that natural science studies. The reason this is done is to pretend that the proven success of natural science can be transferred to these otherwise difficult areas. According to Hayek, this has had a negative impact in the so-called social sciences as a result of scientistic innovations such as Keynesianism and Socialism. According to Collingwood, uncritical acceptance of natural science will ultimately destroy the foundations of Western civilization and usher in a new era of barbarism.
So what does this have to do with data management? Well, as a trained scientist myself (a biologist), I have often reflected that with respect to what I do in data management, I have learned nothing from science. It is true that the technology that makes data management possible is based on engineering, which is based on science, but data management is not about this technology any more than writing is about paper and ink. There simply seem to be no lessons learned from natural science that can be directly transferred to data management.
The Role of Philosophy
By contrast, there are lessons that are derived from philosophy that can be applied to data management. Here are a few:
1. The theory and practice of definitions, which are a very old part of logic and are the basis of semantics.
2. The rules of normalization, which are derived from logic.
3. The differences between generic (supertype-subtype) and partitive (part-whole) conceptual systems types are yet another set of lessons from philosophy.
4. The principles of logical division and classification, which are used in constructing taxonomies, and go back to Aristotle.
5. The approach of structural decomposition in business analysis, which can be found in Descartes' Method.
6. The basic vocabulary of data management (e.g., entity, attribute, relationship), which goes back more than two millennia in philosophy.
Philosophy versus Science
Recently, the physicist Stephen Hawking announced that “Philosophy is dead.” He claims that it has not kept up with modern developments in science, particularly physics. By contrast, R.G. Collingwood, writing in the mid-twentieth century proposed that philosophy provides a framework within which all natural science is possible - it is a kind of “master science.” It seems very likely to me that data management is part of this master science, and if this is true it will position data management in philosophy, and not in natural science.
Let us consider definitions as an example. The International Astronomical Union caused outrage a few years ago when it redefined “planet” so as to exclude Pluto. The new IAU definition states that a planet:
is a celestial body that (a) is in orbit around the Sun, (b) has sufficient mass for its self-gravity to overcome rigid body forces so that it assumes a hydrostatic equilibrium (nearly round) shape, and (c) has cleared the neighbourhood around its orbit.
There are some problems here. The first is that the IAU is actually defining “planets and other bodies in our Solar System, except satellites.” It is not strictly defining planet. Does the IAU maintain there are only planets in “our solar system”? What exactly are all the extrasolar planets we have been hearing about for the past few years? Why is the term “nearly round” used in the definition? A circle can be round, and only exists in two dimensions. Surely “nearly spherical” would have been more accurate. And what exactly is the “neighourhood” referred to in the definition? This point is very important because it is the only point by which Pluto is held to differ to true “planets.” The IAU saw fit to define “planet” and “dwarf planet.” If they had followed classical logic they would have realized they needed either a superordinate class or a coordinate class, and they would have given us something like “planet” (the genus or supertype) and “dwarf planet” and “un-dwarf planet” (the species or subtypes).














I for one, love the term 'data scientist'. I think it lends credibility to a growing field of analysts.
Thank you for writing this article!
1. Data modeling is classic foundational analysis. Its a mix of semantics and logic, and while it may look to outsiders like metaphysics, its not metaphysical at all. See e.g. Montague's project at UCLA to reduce language to logic. 2. Business Intelligence is all about justification - which is the branch of philosophy called Epistemology - or how we *know* that something is the case. Classically "knowledge" is defined as "justified true belief" but there are many theories of how this works. I'm personally more of what's called a "reliabilist." 3. MDM is actually the problem of meaning, or how it is that you can discover the referent of two senses. The classic example is "'the morning star' denotes 'venus'" and "'the evening star' denotes 'venus'", but "morning star" and "evening star" are not the same thing, in much the same way one shouldn't use maiden name and married name or mailing address or billing address interchangeably.
There are a few of the gray-hairs with wing-backed chairs left, sadly fewer than there used to be. But for the most part the philosophers you're referring to are as likely to be influencing people at Xerox PARC, SRI or Microsoft as they are to be teaching introductory logic for the 30th time.
The tool/method -centric approach that is popular now is valuable but going back to the roots and limits of them is critical. One might check the somewhat obscure philosopher Wittgenstein and the more popular Popper for limits of science/data regarding knowledge. (Although, I've just given away my edge in the field )
If I ever do the Ph.D., that would be my area. Interested in your and other's views on my comments.
There are other aspects of Data Management that generally fall under the topic of data governance and stewardship. I would say these are closely aligned with Management. Borrowing from Wikipedia, management is "the act of getting people together to accomplish desired goals and objectives using available resources efficiently and effectively."
Data Science exists as another class of activity, more specialized than Data Management but reliant on it. Again borrowing from Wikipedia, science "is a systematic enterprise that builds and organizes knowledge in the form of testable explanations and predictions." Clearly there are many computational physicists, biologists, economist, and others who organize data as a representation of knowledge. They build computational models that produce testable explanations and predictions.
I do work in a university where we have programs that encourage students to apply, analyze, synthesize, and evaluate using data modeling and mathematics. (See marquette.edu/computing or marquette.edu/mscs)The work follows the guidance of the Scientific Method. We produce Data Scientists who have been taught Data Management.
The boundary between data and science is a fruitful one that has produced significant impact. We ought to use the term when it is appropriate.
My only edit would be to change the title of the article to Data Management "ought to be" Based on Philosophy, Not "just Technology"
An "ought" is not an "is". Philosophy is science.
Data "scientists" are often technologists. Science involves more than technology.
thanks again -
Thank you for the article. There seems to be less and less foundational theory being taught in schools now in CS, or maybe it is just the study-only-the-current commercial-apps approach of two year specialty "colleges" and "institutes".
Maybe with the upsurge of the Semantic Web and ontologies this will change. Philosophers have been doing merology, logic and ontologies since Plato. We philosophy students have a 2500 year head start over ITT tech majors.
Gary D.
Where I disagree with the article is in some of the old practices of data modeling that are not based on philosophy and the metaphysical world of conceptual patterns, but on physical world modeling of entities. This is where past practices have gone wrong for effective data management. We should have stayed true to the root concepts in philosophy, discovered the appropriate patterns, and not have bought into commercial tools and methods.
While a data scientist may claim that using more sophisticated algorithms and more sophisticated tools (a commercial usually follows at this point)delivers a new kind of information, the basic truth is that if you put garbage data into anything you only get garbage information out, no matter how sophisticated are the algorithms used. Add disparate context and confused meanings to the data and the "information out" is likely misleading and dangerous.