Free Site RegistrationFree Site Registration

The BI Invention You’ve Never Heard Of

BI Review Online, November 8, 2007

John Zuchero

Would you pay good money for a data structure that made gathering business intelligence (BI) data quicker, cheaper and easier? While this may sound like a no-brainer, some computer scientists at Unisys Corporation have been having a difficult time getting the BI community to learn about the positive results of their newest discovery.

Advertisement

 

Over the past few years, computer software engineer and theoretical mathematician Jane Mazzagatti has developed and patented a new data structure that she and her colleagues have shown can find not only simple relationships among data but also can discover more complex, less easy-to-find relationships in vast amounts of real-time data streams. And, by real-time data they mean real-time - answers to queries may change as more and more data are introduced into the structure - similar to how we are able to change our perceptions and decisions based on the introduction of new facts.

 

Mazzagatti calls this new data structure the Triadic Continuum, in honor of the theories and writings of Charles Sanders Peirce, one of the least well-known scientific geniuses of the late 19th century. Peirce, who is recognized as the father of pragmatism, is also known for his work in semiotics, the study of thought signs. Using Peirce’s theoretical writings on how thought signs are organized into the structure of the human brain, Mazzagatti extrapolated a computer data structure that is self organizing - in other words, a data structure that naturally organizes new data by either building on the existing data sequences or adding to the structure as new data are introduced.

 

She and her colleagues began their quest for a new data structure because of the perceived limitations of databases and data cubes. While both technologies have proved their usefulness in gathering, storing and querying large amounts of business data, there are issues associated with modifying, updating and adding information into an existing structure. For example, one of the main problems with data cubes is that they are time-consuming to design and program, and the queries are limited to the exact data in the cube at the time it is created. Therefore, every time the data in the cube changes, the cube must be recreated, which is especially bothersome if the data is transactional data that changes constantly. Say, for example, a nationwide building supply company uses database cubes to identify potential trends in their business, and it takes weeks to create a cube. The data in the cube is weeks old before it is ready to query. A time lag of weeks, if not many months in some cases, means that the data is outdated before the first query can be asked. Consequently, this limitation virtually eliminates the ability to perform queries in real or near-real time. In Mazzagatti’s Triadic Continuum there is no need to recreate the structure; the structure changes naturally as new data is added or changed or old data is deleted. With new information, the structure continuously reorganizes without the need of additional programmer help.

 

In the BI world, this means that the traditional approach of assembling data in one place, generating cubes or OLAP queries, and turning information into knowledge by recognizing patterns, is shortened dramatically. Mazzagatti and her colleagues believe that the time it takes to design and develop a BI solution, generally from identifying an information need through designing the schema and the cube to mining it, can be reduced by as much as 75 percent. This is accomplished because there is no need to create a schema and a cube, and the time to extract, transform and load data is simplified. This all leads to the ability to create usable knowledge faster and cheaper. It also moves BI from just a strategic endeavor to one that can be used tactically, since there is no longer a significant time lag between knowing what information you need and how to get it.

 

So what is this structure, and why is it so powerful? And why haven’t you heard of it already? The answers to these three questions are a mixture of easy and difficult to answer. Let’s start with what you might assume is the most difficult question but which is actually the easiest: what is this structure?

 

The conceptual model of the structure of the Triadic Continuum is quite simple. Mazzagatti and colleagues use the term “simple and elegant” in explaining how it is organized. Briefly, the structure is comprised of a continuous tree-like arrangement of units, called “triads.” In a traditional tree-like structure, one often sees nodes that are connected to one another by branches or paths. The triads that comprise the Triadic Continuum can be visualized as three nodes arranged in a somewhat triangular formation. Node one is connected to node two by a bidirectional pointer, and node two is connected to node three by another bidirectional pointer. The pointers identify to where and from where a node is connected - thus allowing all nodes to always know their relation within the continuum of branches through only two pointers. And, theoretically, each individual particle of data occurs only once within the structure, and because of the organization of the bidirectional pointers the relationship of one datum to another is always known. While this may seem powerful, it’s not the only thing that makes this structure so important.

Page 1 of 3.

Advertisement

Advertisement