MAY 1, 2012 4:19am ET

Related Links

Cisco Bests Profit Estimates on Surging Network Data Demand
May 16, 2013
Do you Really Need to Embrace Analytics?
May 15, 2013
200 Million Rows of Data and Counting
May 15, 2013

Web Seminars

IBM & Teradata Compared: A Total Cost of Ownership Study
May 22, 2013
What Is Data Science? You Might Be Surprised!
June 3, 2013
AARP: Embracing Dynamic, Agile Analytics Platforms for Big Data
June 5, 2013
column

Data Management is Based on Philosophy, Not Science

Print
Reprints
Email

There's a joke running around on Twitter that the definition of a data scientist is “a data analyst who lives in California.” I'm sure the good natured folks of the Golden State will not object to me bringing this up to make a point. The point is: Thinking purely in terms of marketing, which is a better title -- data scientist or data philosopher?

My instincts tell me there is no contest. The term data scientist conjures up an image of a tense, driven individual, surrounded by complex technology in a laboratory somewhere, wrestling valuable secrets out of the strange substance called data. By contrast, the term data philosopher brings to mind a pipe-smoking elderly gentleman sitting in a winged chair in some dusty recess of academia where he occasionally engages in meaningless word games with like-minded individuals. 

These stereotypes are obviously crude, but they are probably what would come into the minds of most executive managers. Yet how true are they? I submit that there is a strong case that data management is much more like applied philosophy than it is like applied science.

Making Distinctions

I have argued before that a science of data is possible, but by “science” I mean an organized body or knowledge that addresses a particular set of problems using a particular set of methods. Today, science is really a shorthand reference to “natural science” which is the set of sciences that investigate the material world (meaning physics, chemistry, biology, etc.). The success of natural science over the past five centuries has been indisputably revolutionary.

However, various authors, such as F.A. Hayek and R.G. Collingwood, have argued that this has given rise to scientism (this is Hayek's word - Collingwood's term was pseudo-science). Scientism is the misapplication of the language and methods of natural science to departments of human experience which utterly unlike the material world that natural science studies. The reason this is done is to pretend that the proven success of natural science can be transferred to these otherwise difficult areas. According to Hayek, this has had a negative impact in the so-called social sciences as a result of scientistic innovations such as Keynesianism and Socialism. According to Collingwood, uncritical acceptance of natural science will ultimately destroy the foundations of Western civilization and usher in a new era of barbarism.

So what does this have to do with data management? Well, as a trained scientist myself (a biologist), I have often reflected that with respect to what I do in data management, I have learned nothing from science. It is true that the technology that makes data management possible is based on engineering, which is based on science, but data management is not about this technology any more than writing is about paper and ink. There simply seem to be no lessons learned from natural science that can be directly transferred to data management.

The Role of Philosophy

By contrast, there are lessons that are derived from philosophy that can be applied to data management.  Here are a few:

1. The theory and practice of definitions, which are a very old part of logic and are the basis of semantics.

2. The rules of normalization, which are derived from logic.   

3. The differences between generic (supertype-subtype) and partitive (part-whole) conceptual systems types are yet another set of lessons from philosophy.  

4. The principles of logical division and classification, which are used in constructing taxonomies, and go back to Aristotle.

5. The approach of structural decomposition in business analysis, which can be found in Descartes' Method.  

6. The basic vocabulary of data management (e.g., entity, attribute, relationship), which goes back more than two millennia in philosophy.

Philosophy versus Science

Recently, the physicist Stephen Hawking announced that “Philosophy is dead.” He claims that it has not kept up with modern developments in science, particularly physics. By contrast, R.G. Collingwood, writing in the mid-twentieth century proposed that philosophy provides a framework within which all natural science is possible - it is a kind of “master science.” It seems very likely to me that data management is part of this master science, and if this is true it will position data management in philosophy, and not in natural science.

Let us consider definitions as an example. The International Astronomical Union caused outrage a few years ago when it redefined “planet” so as to exclude Pluto. The new IAU definition states that a planet:

is a celestial body that (a) is in orbit around the Sun, (b) has sufficient mass for its self-gravity to overcome rigid body forces so that it assumes a hydrostatic equilibrium (nearly round) shape, and (c) has cleared the neighbourhood around its orbit.

There are some problems here. The first is that the IAU is actually defining “planets and other bodies in our Solar System, except satellites.” It is not strictly defining planet. Does the IAU maintain there are only planets in “our solar system”? What exactly are all the extrasolar planets we have been hearing about for the past few years?  Why is the term “nearly round” used in the definition? A circle can be round, and only exists in two dimensions. Surely “nearly spherical” would have been more accurate.  And what exactly is the “neighourhood” referred to in the definition? This point is very important because it is the only point by which Pluto is held to differ to true “planets.” The IAU saw fit to define “planet” and “dwarf planet.” If they had followed classical logic they would have realized they needed either a superordinate class or a coordinate class, and they would have given us something like “planet” (the genus or supertype) and “dwarf planet” and “un-dwarf planet” (the species or subtypes).

Advertisement

Comments (15)
Absolutely the nerdiest thing I will ready today... and I loved it. What you lay out is not so much that data management is a science, but actually a tool of science. So what then is data analysis? Research?

I for one, love the term 'data scientist'. I think it lends credibility to a growing field of analysts.

Thank you for writing this article!

Posted by Jody C | Tuesday, May 01 2012 at 10:51AM ET
I have a couple of philosophy degrees, primarily focused on epistemology, logic, and math. I think you're dead on, and I probably do more straightforward philosophy in a week than my counterparts in academia. There's three areas I use my education all the time:

1. Data modeling is classic foundational analysis. Its a mix of semantics and logic, and while it may look to outsiders like metaphysics, its not metaphysical at all. See e.g. Montague's project at UCLA to reduce language to logic. 2. Business Intelligence is all about justification - which is the branch of philosophy called Epistemology - or how we *know* that something is the case. Classically "knowledge" is defined as "justified true belief" but there are many theories of how this works. I'm personally more of what's called a "reliabilist." 3. MDM is actually the problem of meaning, or how it is that you can discover the referent of two senses. The classic example is "'the morning star' denotes 'venus'" and "'the evening star' denotes 'venus'", but "morning star" and "evening star" are not the same thing, in much the same way one shouldn't use maiden name and married name or mailing address or billing address interchangeably.

There are a few of the gray-hairs with wing-backed chairs left, sadly fewer than there used to be. But for the most part the philosophers you're referring to are as likely to be influencing people at Xerox PARC, SRI or Microsoft as they are to be teaching introductory logic for the 30th time.

Posted by David G | Tuesday, May 01 2012 at 11:10AM ET
Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.

Where do young IT professionals (30 and under) obtain information to aid with daily role responsibilities and career development?

Trade publication websites 14%
Social media 23%
Vendor websites 4%
Vendor/community forums 7%
Newsletters 1%
Trade conferences/meetups 2%
RSS feeds 6%
Web search 44%

 

Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.