The Database State

  • July 23 2008, 12:59pm EDT

Technical issues tend, quite naturally, to be the focus of attention for data professionals, but those who call themselves “professionals” have a responsibility to pause occasionally and reflect on the wider impact of their profession. Data professionals are not exempt from this.

We are now at a point where some reflection is necessary. The fact is that the true nature of the information age is still only felt in outline. Nobody has yet been able to explain it in a coherent manner the way that Adam Smith was able to do in the industrial age in The Wealth of Nations. Indeed, the nature of the outline of the information age keeps changing, and a major trend is the increasing focus away from automation toward data. While this is a very positive development for the data management profession, it also means that our activities will likely come under more scrutiny.

There is growing public unease about the ways in which data of all kinds is being collected, integrated and managed by governments. To be fair, this concern extends to entities other than governments, but governments are the biggest area of concern. The prospect is now emerging for governments to manage massive amounts of integrated data about a huge range of individuals and activities within their jurisdictions. The potential exists for the emergence of the database state.

The prospect of a database state is the subject of intense debate in Britain, particularly around the topic of DNA. The criminal justice system of the country already has one of the world’s largest databases of DNA profiles. This database contains records for about 4.5 million individuals out a population of 60 million. Some recent high-profile murder cases have prompted serious proposals from a few politicians for full compulsory DNA registration for the entire population. Actually, a large percentage of the records already in the existing database are for individuals who have no criminal record. It could be as many as 1 million, of which more than 100,000 are children. A movement is under way to have these records removed. Any data professional would wonder exactly what “removed” means.

The most worrying prospect is the possibility of integrating the British DNA database with tax records, medical records, driving licenses, regulatory banking information and so on. In this respect, Britain is a very important country in the debate over the database state. It has a much more highly centralized form of government than is found in most industrialized countries. England, in particular, has little in the way of regional government, which exists for U.S. or German states. The UK as a whole has even less in terms of local government. For instance, apart from the important exception of London, there are no mayors with executive authority, unlike France and the U.S.

The centralized administrative departments of the UK government already have large, centralized databases. If the UK could achieve cross-departmental data integration, this could be a model for anyone interested in running a highly centralized authoritarian state. Such individuals still exist in the modern world.

Data professionals, more than anyone else, are aware of the technical challenges to large-scale data integration. Even in the UK, there are issues. The British National Health Service (NHS) is currently having difficulties with an overhaul of its medical records-keeping systems - one of the largest IT projects in the world. I am not suggesting that the British government is interested in or planning for large-scale, cross-departmental data integration. On the other hand, I am informed by a doctor friend that the only organization larger than the British NHS is the Chinese People’s Liberation Army.

When the technical difficulties of data integration are combined with the natural inefficiencies found in the public sector, it is tempting to think that a database state may be a remote prospect. Yet much of the technology for large-scale data integration and management now exists. A few key areas, like source data analysis, are still not handled well. A larger gap exists in terms of integration methodologies, especially for master data management. The least well-developed area is probably data governance.

What if these problems could be overcome? What responsibility would the data profession then have for enabling a database state? I recently met a gentleman from Venezuela who explained how when he went for a job interview with the state oil company, the interviewer simply looked him up on the online electoral rolls, found he was not affiliated with President Chavez’ party and told him to leave. What responsibility do the developers of such a database and application bear?

At this point, we encounter one of the oldest ethical debates in human history. We can make things that can be used for good or bad. In this case, we are talking about large-scale integrated databases under government control. Do we or do we not have any ethical responsibility for how the databases we build are used? Should we take an explicit stand? For instance, we could declare it unethical to integrate data from two or more predefined subject areas.

Or, data professionals could simply stay at the technical level and do our jobs. Would that be “just following orders”? I do not pretend to know the answers. However, I do know that the data profession is the repository of certain types of expertise, which makes us uniquely positioned to answer certain kinds of questions. That alone gives us some level of responsibility.

