Would you pay good money for a data structure that made gathering business intelligence (BI) data quicker, cheaper and easier? While this may sound like a no-brainer, some computer scientists at Unisys Corporation have been having a difficult time getting the BI community to learn about the positive results of their newest discovery.
Over the past few years, computer software engineer and theoretical mathematician Jane Mazzagatti has developed and patented a new data structure that she and her colleagues have shown can find not only simple relationships among data but also can discover more complex, less easy-to-find relationships in vast amounts of real-time data streams. And, by real-time data they mean real-time - answers to queries may change as more and more data are introduced into the structure - similar to how we are able to change our perceptions and decisions based on the introduction of new facts.
Mazzagatti calls this new data structure the Triadic Continuum, in honor of the theories and writings of Charles Sanders Peirce, one of the least well-known scientific geniuses of the late 19th century. Peirce, who is recognized as the father of pragmatism, is also known for his work in semiotics, the study of thought signs. Using Peirces theoretical writings on how thought signs are organized into the structure of the human brain, Mazzagatti extrapolated a computer data structure that is self organizing - in other words, a data structure that naturally organizes new data by either building on the existing data sequences or adding to the structure as new data are introduced.
She and her colleagues began their quest for a new data structure because of the perceived limitations of databases and data cubes. While both technologies have proved their usefulness in gathering, storing and querying large amounts of business data, there are issues associated with modifying, updating and adding information into an existing structure. For example, one of the main problems with data cubes is that they are time-consuming to design and program, and the queries are limited to the exact data in the cube at the time it is created. Therefore, every time the data in the cube changes, the cube must be recreated, which is especially bothersome if the data is transactional data that changes constantly. Say, for example, a nationwide building supply company uses database cubes to identify potential trends in their business, and it takes weeks to create a cube. The data in the cube is weeks old before it is ready to query. A time lag of weeks, if not many months in some cases, means that the data is outdated before the first query can be asked. Consequently, this limitation virtually eliminates the ability to perform queries in real or near-real time. In Mazzagattis Triadic Continuum there is no need to recreate the structure; the structure changes naturally as new data is added or changed or old data is deleted. With new information, the structure continuously reorganizes without the need of additional programmer help.
In the BI world, this means that the traditional approach of assembling data in one place, generating cubes or OLAP queries, and turning information into knowledge by recognizing patterns, is shortened dramatically. Mazzagatti and her colleagues believe that the time it takes to design and develop a BI solution, generally from identifying an information need through designing the schema and the cube to mining it, can be reduced by as much as 75 percent. This is accomplished because there is no need to create a schema and a cube, and the time to extract, transform and load data is simplified. This all leads to the ability to create usable knowledge faster and cheaper. It also moves BI from just a strategic endeavor to one that can be used tactically, since there is no longer a significant time lag between knowing what information you need and how to get it.
So what is this structure, and why is it so powerful? And why havent you heard of it already? The answers to these three questions are a mixture of easy and difficult to answer. Lets start with what you might assume is the most difficult question but which is actually the easiest: what is this structure?
The conceptual model of the structure of the Triadic Continuum is quite simple. Mazzagatti and colleagues use the term simple and elegant in explaining how it is organized. Briefly, the structure is comprised of a continuous tree-like arrangement of units, called triads. In a traditional tree-like structure, one often sees nodes that are connected to one another by branches or paths. The triads that comprise the Triadic Continuum can be visualized as three nodes arranged in a somewhat triangular formation. Node one is connected to node two by a bidirectional pointer, and node two is connected to node three by another bidirectional pointer. The pointers identify to where and from where a node is connected - thus allowing all nodes to always know their relation within the continuum of branches through only two pointers. And, theoretically, each individual particle of data occurs only once within the structure, and because of the organization of the bidirectional pointers the relationship of one datum to another is always known. While this may seem powerful, its not the only thing that makes this structure so important.
At the core of BI is data mining and data modeling, which are both interested in uncovering the knowledge within data and information. However, its increasingly clear that both of these disciplines are having a difficult time doing this. In traditional data structures, data is entered in a fixed format (tables, lists or trees) so that the data can be easily, reliably and consistently found. And, in a very real sense, data and information are discovered; programmers must write programs to enable users to query the data and write other specialized programs to search the distinct data structure in a prescribed way, searching for bits of data and particles of information until eventually something which matches the query criteria is found.
However, and this is significant, in the Triadic Continuum, data are learned into a structure whose format and organization systematically build a record or recording of the associations and relations between the particles of data. Besides that, the physical structure of the Triadic Continuum shapes the methods to obtain information and knowledge from the structure. So, instead of data and information being found, analyzed or discovered, it is already there waiting to be realized. About this incredibly unique aspect of the Triadic Continuum, Mazzagatti often says, Its all in the structure. By this she means that the format and organization of the Triadic Continuum not only hold the representation of the data, but also the associations and relations between the data and the methods to obtain information and knowledge.
And while traditional databases deal mostly with finding data and information, the focus of the Triadic Continuum is in knowledge - acquisition of useful and purposeful knowledge.
The third question is actually the most difficult to answer: why have you never heard of this invention before?
Interestingly enough, the idea to develop a new data structure started on a Pacific island during World War II when a young serviceman by the name of Eugene Gene Pendergraft was introduced to the writings of Charles S. Peirce. From Peirces writings, Pendergraft became intellectually stimulated by the notion that thinking, reasoning and learning are based on biological structures that function through a series of physical operations.
After the war, Pendergraft went on to develop his own theories and in the early 1960s attempted to develop computerized language translation while co-directing a project at the University of Texas at Austin to use computers to translate German, Russian, Japanese and Chinese into English. Through this research Pendergraft theorized that machines, specifically computers, could be made to learn. However, computer memory at the time was too small to allow this. While he and his team were able to demonstrate a rudimentary form of mechanical translation, the project was halted when U.S. Air Force officials cancelled funding. With this setback, Pendergraft put his ideas of mechanized learning on hold until computer technology caught up to his prophetic thinking.
In the early 1990s, when Pendergraft thought the time was right for mechanized learning, he and a small group of programmers and entrepreneurs formed a company. They quickly realized that they needed to pursue a financial and technical relationship with a larger company. Of the five computer companies they contacted, only Unisys Corporation was interested.
Beginning in 1997, a team of engineers and scientists at Unisys Corporation began working on a prototype of Pendergrafts mechanized learning software. When Pendergraft unexpectedly died, others took over his role. But, it was quickly discovered that no one understood Pendergrafts interpretation of Peirce.
Into this came Jane Campbell Mazzagatti - with an extensive background in computer hardware and software, degrees in theoretical mathematics and educational psychology, a deep personal interest in cognitive development, a nonrelenting quest for knowledge and a strong personality, Mazzagatti was the right person to judge the validity of Pendergrafts interpretation of Peirces theories. After extensive study of Peirces writings, she realized that while Pendergraft had understood the import of Peirces writings that he had not correctly seen how Peirces triad might be implemented as a computer data structure.
By 1999, others began to agree with the conclusion that Pendergrafts interpretation was flawed and project funding was halted. While the program may have ceased, on her own, Mazzagatti continued research into how Peirces sign theory could be adapted to create a logical structure composed of signs that could be used in computers. The structure that she finally conceived of and turned into an invention fits into the general computer category of data structures, devices for storing and locating information on computers.
Beginning in early 2000, Mazzagatti worked with another colleague to make her discoveries into a prototype that could be shown to others. This skunk-works project and its prototype were so successful that the management of Unisys R&D began funding a new program. Over the last few years, Mazzagattis prototype has been developed into a product, which is called the Knowledge Store (K-Store).
So again, why havent you heard about this product? Well, for a number of simple reasons. I believe that the number one reason is that this new technology is both revolutionary but also evolutionary. By this I mean that in order to adopt it, a company must be willing to take a gigantic risk - not in the reliability of the technology - but in the change to process and infrastructure. This technology has the potential to be the next evolutionary step in databases, but it has been difficult to find those willing to transform their operations and infrastructure into the next stage of evolution without having seen someone elses success.
The second and equally important reason can be explained in terms of the dynamics and background of Unisys Corporation. Unisys, which was formed as a merger between Burroughs and Sperry Corporations, has tried to transform itself from a hardware vendor to a services-led business. However, while this transformation occurs, many of the R&D staff and the majority of its old guard management still see themselves as a hardware company; software is often misunderstood and efforts to market it are often poorly organized and lacking in innovation and vision. This is especially true with K-Store. Since its inception, K-Stores marketing and sales efforts have been stuck trying to find ways to brand and market this evolutionary product and to whom to market it to.
So, to help introduce this technology, Mazzagatti and I wrote a book called The Practical Peirce: An Introduction to the Triadic Continuum Implemented as a Computer Data Structure. As well, Mazzagatti has taken to the convention circuit to explain her theory at international data structure conferences. This grassroots effort is our attempt to shed light on the best new BI invention youve never heard of.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access