It's like one of those propaganda newsreels from the 1940s: "Marketing Software Goes to War." Yes, the lowly duplicate detection systems designed to eliminate unnecessary catalogs are now hunting bigger game – terrorists at our borders. Fraud detection software has graduated from guarding your cell phone to protecting airports, water supplies and ballparks. Credit card databases that once minded your spending limit are now watching your every move for suspicious behavior.

Like pictures of determined Cub Scouts searching the skies for Nazi warplanes, this is all very cute but doesn't inspire much confidence. One hopes there are professionals hidden in the background who have been working on the problems for years and already come up with more reliable solutions. At a minimum, systems professionals must hope that the people in charge realize how error-prone any automated approach must be.

Just what the people in charge may or may not realize is itself a well-kept secret. One public face of the government's effort is the DARPA Information Awareness Office (, which lists the ambitious goal of achieving "total information awareness that is useful for preemption, national security warning and national security decision making." Specific projects include "focused warnings within an hour after a triggering event occurs" (Total Information Awareness System), "technology enabling ultra-large, all-source information repositories" (Genisys), "automated discovery, extracting and linking of sparse evidence contained in large amounts of classified and unclassified data sources" (Evidence Extraction and Link Discovery) and "automated and adaptive behavior prediction models tuned to specific terrorist groups and individuals" (Wargaming the Asymmetric Environment). This is in addition to more prosaic work on machine translation, text analysis, physical surveillance and decision support.

Published descriptions show that some of these programs are based on previous projects within the intelligence community. However, private conversations with commercial vendors reveal that they too are involved in this sort of work, but not necessarily in these particular DARPA projects. (In fact, in February, the U.S. Congress placed severe restraints on the Information Awareness program. However, it's safe to assume similar research will proceed elsewhere.)

How reliable are these new surveillance systems likely to be? If secret government research has produced advanced solutions, the answer should not be based on the performance of commercial products. Yet the continued involvement of commercial vendors in these projects suggests that their technologies are as good as the government's own. Another hint is that the government uses commercial technology for many of its own deployed systems and has delegated critical surveillance functions – such as watching for suspicious funds transfers – to private institutions that themselves use mostly commercial software. Additionally, at the most basic level, both government and commercial researchers must deal with the same fundamental constraints imposed by the nature of the underlying tasks themselves.

What are those tasks? Apart from issues related to language interpretation, the DARPA program really focuses on two challenges: assembling data from multiple sources and making predictions based on behavior patterns. Both have long been the subject of commercial activity.

Data assembly has a mechanical component, which is essentially the extract, transform and load (ETL) function long familiar to data warehouse developers. A "total" surveillance system would surely supplement batch-oriented ETL with real-time data feeds. While this is a different process, it is also reasonably well-understood. In short, there are few theoretical obstacles to data assembly, although there will be many practical challenges at the gargantuan scale required for a total surveillance system.

One such theoretical obstacle is establishing identity. Information with a specific ID, such as a passport or credit card number, can easily be tied to other information using the same ID. However, different sources use different ID systems, and the ID numbers themselves can be misreported, intentionally or not. Thus, any surveillance system must deal with the decidedly non-mechanical problem of linking records based on name, address and other information that itself may be non-unique and inconsistently recorded.

Happily, governments and businesses already have more effective commercial linking technology than the standard deduplication systems used for catalog mailings. However, even this technology is imperfect, and any mistake is costly. A missed match means a terrorist passes undetected, while a false positive can disrupt the life of an innocent person. The obvious solution ­ accepting a number of false positives to avoid missing any true matches ­ works in applications such as border control, but causes unacceptable pollution when assembling dossiers in a database.

Of course, data generated by terrorists contains more than random errors. They hide their tracks intentionally through simple tricks such as using initials and variant name spellings, and more advanced ones such as using multiple identities. Software can counter some techniques: for example, it can link all identities at the same address. Yet, clever operatives will establish – or steal – identities that have no logical connection. Therefore, perfect linking will never be possible.

The second DARPA focus is predicting behavior based on patterns. This is another field with an established commercial market, both in security matters such as fraud detection and in marketing applications such as response and attrition modeling. Like linking software, pattern detection systems face a fundamental limit on their accuracy; they need previous patterns to use as a basis for prediction. While it's unlikely that twenty Middle Eastern men could take flight training today without being noticed, it's equally unlikely that terrorists will try. Instead, they'll do something different ­ and if it's truly unique, no pattern detection system will notice.

Of course, it's a good thing that the U.S. has seen too few terrorist incidents to identify many patterns. It's possible that some information can be gained from other places where terrorism is more common. It's also possible that a system might simply highlight activity that's unusual without necessarily being suspicious. This could at least lead to closer investigation ­ so long as the investigators have a high tolerance for false alarms. However, it's hard to imagine that pattern detection systems will ever provide anything close to comprehensive protection.

In short, the two key tasks at the heart of total surveillance are inherently limited. All data regarding an individual can never be perfectly linked and patterns predicting terrorist acts can never be perfectly detected. Of course, perfection is an unnecessarily high standard ­ remember that DARPA's own statement sets the bar at a much less ambitious "useful." Yet it's important to recognize that these systems have very real limitations. Readers of this magazine will interact with these systems as professionals by working on them, feeding them data, receiving their outputs or commenting on them to others. This gives us a special responsibility to insist that total surveillance be treated with the same intelligent skepticism as any other systems project, or – with both liberty and security at stake – perhaps even a bit more.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access