MAY 1, 2012 4:19am ET

Related Links

Cisco Bests Profit Estimates on Surging Network Data Demand
May 16, 2013
Do you Really Need to Embrace Analytics?
May 15, 2013
200 Million Rows of Data and Counting
May 15, 2013

Web Seminars

IBM & Teradata Compared: A Total Cost of Ownership Study
May 22, 2013
What Is Data Science? You Might Be Surprised!
June 3, 2013
AARP: Embracing Dynamic, Agile Analytics Platforms for Big Data
June 5, 2013
column

Data Management is Based on Philosophy, Not Science

Print
Reprints
Email

There's a joke running around on Twitter that the definition of a data scientist is “a data analyst who lives in California.” I'm sure the good natured folks of the Golden State will not object to me bringing this up to make a point. The point is: Thinking purely in terms of marketing, which is a better title -- data scientist or data philosopher?

My instincts tell me there is no contest. The term data scientist conjures up an image of a tense, driven individual, surrounded by complex technology in a laboratory somewhere, wrestling valuable secrets out of the strange substance called data. By contrast, the term data philosopher brings to mind a pipe-smoking elderly gentleman sitting in a winged chair in some dusty recess of academia where he occasionally engages in meaningless word games with like-minded individuals. 

These stereotypes are obviously crude, but they are probably what would come into the minds of most executive managers. Yet how true are they? I submit that there is a strong case that data management is much more like applied philosophy than it is like applied science.

Making Distinctions

I have argued before that a science of data is possible, but by “science” I mean an organized body or knowledge that addresses a particular set of problems using a particular set of methods. Today, science is really a shorthand reference to “natural science” which is the set of sciences that investigate the material world (meaning physics, chemistry, biology, etc.). The success of natural science over the past five centuries has been indisputably revolutionary.

However, various authors, such as F.A. Hayek and R.G. Collingwood, have argued that this has given rise to scientism (this is Hayek's word - Collingwood's term was pseudo-science). Scientism is the misapplication of the language and methods of natural science to departments of human experience which utterly unlike the material world that natural science studies. The reason this is done is to pretend that the proven success of natural science can be transferred to these otherwise difficult areas. According to Hayek, this has had a negative impact in the so-called social sciences as a result of scientistic innovations such as Keynesianism and Socialism. According to Collingwood, uncritical acceptance of natural science will ultimately destroy the foundations of Western civilization and usher in a new era of barbarism.

So what does this have to do with data management? Well, as a trained scientist myself (a biologist), I have often reflected that with respect to what I do in data management, I have learned nothing from science. It is true that the technology that makes data management possible is based on engineering, which is based on science, but data management is not about this technology any more than writing is about paper and ink. There simply seem to be no lessons learned from natural science that can be directly transferred to data management.

The Role of Philosophy

By contrast, there are lessons that are derived from philosophy that can be applied to data management.  Here are a few:

1. The theory and practice of definitions, which are a very old part of logic and are the basis of semantics.

2. The rules of normalization, which are derived from logic.   

3. The differences between generic (supertype-subtype) and partitive (part-whole) conceptual systems types are yet another set of lessons from philosophy.  

4. The principles of logical division and classification, which are used in constructing taxonomies, and go back to Aristotle.

5. The approach of structural decomposition in business analysis, which can be found in Descartes' Method.  

6. The basic vocabulary of data management (e.g., entity, attribute, relationship), which goes back more than two millennia in philosophy.

Philosophy versus Science

Recently, the physicist Stephen Hawking announced that “Philosophy is dead.” He claims that it has not kept up with modern developments in science, particularly physics. By contrast, R.G. Collingwood, writing in the mid-twentieth century proposed that philosophy provides a framework within which all natural science is possible - it is a kind of “master science.” It seems very likely to me that data management is part of this master science, and if this is true it will position data management in philosophy, and not in natural science.

Let us consider definitions as an example. The International Astronomical Union caused outrage a few years ago when it redefined “planet” so as to exclude Pluto. The new IAU definition states that a planet:

is a celestial body that (a) is in orbit around the Sun, (b) has sufficient mass for its self-gravity to overcome rigid body forces so that it assumes a hydrostatic equilibrium (nearly round) shape, and (c) has cleared the neighbourhood around its orbit.

There are some problems here. The first is that the IAU is actually defining “planets and other bodies in our Solar System, except satellites.” It is not strictly defining planet. Does the IAU maintain there are only planets in “our solar system”? What exactly are all the extrasolar planets we have been hearing about for the past few years?  Why is the term “nearly round” used in the definition? A circle can be round, and only exists in two dimensions. Surely “nearly spherical” would have been more accurate.  And what exactly is the “neighourhood” referred to in the definition? This point is very important because it is the only point by which Pluto is held to differ to true “planets.” The IAU saw fit to define “planet” and “dwarf planet.” If they had followed classical logic they would have realized they needed either a superordinate class or a coordinate class, and they would have given us something like “planet” (the genus or supertype) and “dwarf planet” and “un-dwarf planet” (the species or subtypes).

Advertisement

Comments (15)
Although great, I think there still is imprecision in the column. Data Science might be closest to the Philosophical area, Epistemology (roughly, philosophy of knowing).

The tool/method -centric approach that is popular now is valuable but going back to the roots and limits of them is critical. One might check the somewhat obscure philosopher Wittgenstein and the more popular Popper for limits of science/data regarding knowledge. (Although, I've just given away my edge in the field )

If I ever do the Ph.D., that would be my area. Interested in your and other's views on my comments.

Posted by Phil M | Tuesday, May 01 2012 at 11:16AM ET
Agile developers would love to refer to data analysts as philosophers to malign the discipline of data management as opinion while they deal in facts. It is true that because data management occurs primarily in the planning, analysis, and design phases, that a derived benefit in dollars is difficult to estimate. Data Management in fact is closer to geometry in that there are basic assumptions that must be made, but if followed they provide benefits, such as: one fact one place, management understanding the logical structure of their data to manage the enterprise, parallel the physical with the logical as much as reasonable, share data, implement/maintain data quality. If developers design your database they tend to optimize it to support their development rather than the actual system operation. If designed by report analysts, one tends to find many copies of the same data in the data warehouse, sometimes one for each report, which is responsible for the volume of data multiplying by 2 every seven years. Data analysis is really common sense in harmony with the KISS principle. Principled data management is similar to principled actions in any other walk of life, can be called philosophy or opinion, but is necessary to successfully maintain control of an enterprise's data, limit cost growth, and to stem chaos.
Posted by Thomas B | Tuesday, May 01 2012 at 11:20AM ET
Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.

Where do young IT professionals (30 and under) obtain information to aid with daily role responsibilities and career development?

Trade publication websites 14%
Social media 23%
Vendor websites 4%
Vendor/community forums 7%
Newsletters 1%
Trade conferences/meetups 2%
RSS feeds 6%
Web search 44%

 

Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.