I encountered an interesting problem a little while ago while working with a group of business users who wanted to be able to account for a trade in a certain way in order to report it.

One way of handling an incorrectly booked trade is called the "cancel/correct" approach. In this method, the original trade record is flagged to indicate that it has been canceled, and certain attributes are updated to provide additional information about who did this and why. After that, a new correct trade record is generated. The users wanted to express this activity in terms of credits and debits, and the cancel/correct approach did not fit their needs. They wanted to keep the original record - let's say it was a buy trade - without marking it as canceled. Then they wanted to create an offsetting sell trade to net everything out to zero. Finally, they wanted to create a second buy trade record that correctly represented the trade.

Years ago, I would have taken these "requirements" and cheerfully implemented them. Today, I have serious ethical misgivings about this kind of nonsense, having spent almost 20 years working on securitization technology. Given the consequences, there is an ethical dimension to how we manage data, and its basis lies in how we think about truth.

The Correspondence Theory of Truth

Aristotle defined truth as follows: "To say of that what is that it is not, or of that what is not, that it is, is false, while to say of that what is that it is, and of that what is not that it is not, is true" (Metaphysics, Book IV). Admittedly, this is a mouthful. Suppose I apply Aristotle's definition as I think of another Greek philosopher, Socrates, and I make the statements shown in the table. Two of the statements are true and two are false, and we can easily see how the correspondence theory of truth works.

Aristotle's definition of truth is easily applied to data. If data represents the reality it is supposed to represent, then it is true, but if it does not, it is false. This may sound a bit odd. In data management, we talk about data quality, but we do not talk about truth of data. In fact, truth sounds like a dangerous topic. If the data is false, then maybe we are doing something illegal.

The problem is, I do not see how we can put off thinking about this problem indefinitely. We have had almost 50 years of expanding computer infrastructures that touch on vast areas of our lives. The importance of data, and management of that data, has grown in that time. We may now even be entering a Golden Age of data. This means we are going to have to grow up in the way we do data management.

Going back to the user "requirements" for introducing a "fake" sell trade to offset a wrong buy trade in order to easily create credits and debits in the books. Immediate problems arise if we allow ourselves to think of this in terms of truth. First, a true sell trade never actually happened. Second, we are pretending that there were two buy trades. We have left the first one (the one wrongly recorded) as if it were correct. Then we have entered a second buy trade as if that were independent of the first one. Now, the users who want to book the trades with the credits and debits showing nicely are happy. But a report that shows the average number of trades per trader per day will show more trades than are real. The answer to that problem might be to filter out the fake trades. But why not tell the truth from the start?

Once you begin to tell lies, you have to do so consistently, and it always takes more effort than telling the truth. This is a pragmatic consideration, not an ethical one. We should tell the truth because it is right, not because it takes less effort.

However, I am acutely aware that pragmatism is often stated as a reason for bending data out of shape, and that I am likely to be criticized by individuals who will claim there are only requirements and design solutions, and that considerations of truth are irrelevant to data. Nevertheless, I would hope that such individuals would at least think that the debate about whether data should reflect truth is worth having. The outcome of such a debate should provide important guidance in the area of data governance.