Rip Van Winkle wakes up in the 1970s and wants to hear all the music he’s missed. He tells a salesperson at the store what he wants, and he is asked some curious questions: “Would he prefer to hear just a few masterpieces over and over, or would he like to sample everything available at random? Does he expect the music to be portable? How’s his reception?” At the end of this strange interview, he’s given the choice of a console cassette player or a portable radio. Rip Van Winkle’s problem is that he wants to catch up on music; the solution he needs - an MP3 player loaded with thousands of songs - is a three-decade nap away. Sensibly, because of the technological limitations of the day, he compromises on a solution.

 

Information visibility follows a similar storyline that of compromising on a solution to the business problem because of the limitations of underlying architecture. In the past when you needed state-of-the-art information visibility and phoned your technology analyst for advice, you would also have to answer a strange string of questions first: “Is your data structured or unstructured? How savvy are the users? Do you want to search, report or analyze it?” And based on your answers, you would have been triaged to either the business intelligence BI or enterprise search expert.

 

BI and enterprise search are based on architectures – the relational database management systems (RDBMS) and the search engine’s inverted index – that also go back to the 1970s. All architecture is bound to the hardware of the day, and computing power has increased a million-fold since then. With an architecture designed around newer technology, we can solve the problem of information visibility rather than compromising with partial solutions like search, analytics and reporting.

 

What is information visibility? It’s the ability to take the information we’ve invested billions of dollars in capturing, storing and processing and let anyone understand it, in support of everyday business decisions. That vision sounds familiar, but the industry has been compromising for so long that it has lost sight of the full vision. To see it fresh, look at some before and after pairs; in each, BI and search are good at the first part, but are helpless on the second:

  • BI/search: “Is part #KD-68329 in inventory?”
    Information visibility: “Is there a good wheel bearing available for the landing gear I’m designing, or do I need to spec a new one?
  • BI/search: “How much did same store sales increase year-over-year?”
    Information visibility: “How can I increase sales of grills this spring?”
  • BI/search: “What restaurants are within one mile of 101 5th Ave., New York, NY?”
    Information visibility: “Where should I buy a house for my job transfer to New York?”

No SQL statement, search query or report comes close to answering these information visibility questions. Nevertheless, workers answer questions just like those throughout the day. They are already sitting on the raw information that could help them get better answers if only they had better information visibility.

 

Information invisibility questions can’t be answered by a computer at all – they can only be answered by a person. Consider the final question: “Where should I buy a house for my job transfer to New York?” Imagine a computer told you the answer was “123 Broadway, Apt 1, New York, NY.” You wouldn’t possibly trust it. In fact, you wouldn’t even trust another person’s answer. Only someone steeped in the problem, exercising judgment based on their own experience can answer it. That’s why the greatest investment in any information-intensive business is in its people, not in IT.

 

Yet computers can help far more than they do currently. How? They should help us make decisions by summarizing all the information relevant to our goal, and making it easy for us to iterate and make trade-offs as we come to understand our options. Zillow.com gives just one example of the new generation of information visibility applications. The wildly popular real estate Web site helps a person quickly digest hundreds of millions of pieces of data, letting them search, analyze, browse, map, visualize and compare it without any training. Only after that immersion in rich information can we feel comfortable that we’ve made a good decision on where to move. So if a far better experience is possible, why isn’t it available within the enterprise?

 

Beyond RDBMS and Enterprise Search

 

The key to the next generation of information visibility is in new databases tailored to the problem. For years we’ve been hearing of specialty databases entering the market and failing, so a healthy skepticism is in order. After more than three decades, the RDBMS is still king, despite its birth on much earlier hardware. This is a testament to how optimal the relational model was, but great architecture only buys so many extra years. What lives much longer than code is the value of a standard. It’s the knowledge and training of thousands of database administrators (DBAs). It’s code and data reuse. It’s vendor consolidation. And for Michael Stonebraker, a father of the RDBMS and its greatest defender through the decades, its value as a standard puts the burden of proof on any upstarts proposing improvements. It’s not enough for a newcomer to be better than the relational database or even five times as good. It needs to be orders of magnitude better.

 

The RDBMS is a one-size-fits-all solution. When it was architected, it was optimized for the use case of its day, processing business transactions or online transactional processing (OLTP). Of course, we’re not still running the same relational database of the 1970s. In fact, newcomers have come along and endured. But rather than replacing the relational database, they were incorporated into the model. Data Blades are one such example, introduced by Stonebraker himself to support object relational work with object classes like geo. The work was folded into Informix, then replicated by Oracle and Sybase, and today it’s an add-on class. Multidimensional databases and online analytical processing (OLAP) are other examples. Hyperion’s OLAP made an early surge as a new class of database by overcoming a severe shortcoming in the relational model, especially for analyzing multidimensional data needed for back office financials. Then Micro Strategy introduced relational online analytical processing  ROLAP, successfully jamming aggregate tables into relational tables. Now, all the majors vendors have OLAP extensions, and Hyperion was buried and then acquired. The pattern is the same: a new functionality proves its value and then is retrofitted into the core relational model.

 

Not all functionality can be retrofitted onto the relational database; however, sometimes the architectural boundary is so high that only a green field approach is possible. Stonebraker recently declared that his high bar to accept a new database, measured in orders of magnitude, has finally been cleared. As he and his co-author conclude in “One Size Fits All: An Idea Whose Time Has Come and Gone,” a landmark paper published in 2007 under an NSF grant, “We believe that the DBMS market is entering a period of very interesting times. There are a variety of existing and newly-emerging applications that can benefit from data management and processing principles and techniques. At the same time, these applications are very much different from business data processing and from each other - there seems to be no obvious way to support them with a single code line. The ‘one size fits all’ theme is unlikely to successfully continue under these circumstances.1”

 

So who clears the bar? Stonebraker envisions several areas, like stream-processing engines and data warehousing appliances. (For anyone that has followed his career, it won’t come as a surprise that he is behind companies competing in several of these, like Vertica and Streambase.) Information visibility is another ripe area.

 

Updating the Model

 

To understand why a new database is needed rather than an update to the RDBMS, it’s worth a quick refresher on how all databases are structured. Let’s consider a simplified three-part model and evaluate the familiar RDBMS against it:

  1. Data model. This is an externally visible language for describing information. For the RDBMS, here is the relational model based on tables and columns. This is a prescriptive model, meaning an administrator must structure data into this form. The model doesn’t gracefully accommodate text documents.
  2. Query mechanism. This is the externally visible query interface. For the RDBMS, this is SQL, a formal language based on relational algebra. SQL has a significant learning curve and few fans.
  3. Implementation architecture. This is how the data is stored, how the queries are planned and evaluated, how the index is stored and evaluated, and how locking is managed. It’s everything about how it does what it does. RDBMS was architected to make efficient use of the hardware of its day, which assumed memory, disc and processors much less powerful than today. While it has been continuously updated, many of those original assumptions have created long-standing artifacts.

So how do the most basic requirements of information visibility stack up against that model? Because I’m demanding improvements measured in orders of magnitude, the answer can be found with broad strokes. The evaluation can be simplified to changes in today’s information against the data model, today’s users and their goals against the query mechanism and the newest hardware against the implementation architecture.

  1. Data model. Today’s enterprise information spans a range of structures and silos, from transactions spread across data warehouses and marts, to documents on file servers, to XML and data in packaged software from IBM, SAP, Microsoft, Oracle and others. Information is abundant and constantly updating. Considering that it’s impossible to mandate, we model it into a single prescriptive model with a master schema. The only way to keep up with the flux is for each record to be self-describing. Think XML or an equivalent flexible schema.
  2. Query mechanism. A few analysts can train on SQL and power tools, but the rest of us expect a Web-like experience. Back to the Zillow example, people need to navigate, analyze, visualize and search in order to get the big picture, the small picture, trends and forecasts. Imagine attempting that by ping-ponging across search engines, reports, OLAP and SQL. Only Real Estate Investment Trust analysts would bother. Instead, we need a rich user experience that anyone can use.
  3. Implementation architecture. Today, storage is nearly unlimited, CPUs are 64-bit quadcore chips or better, and RAM is inexpensive. We can architect for efficient processing on new iron, but also put as much processing power to bear on an immersive user experience.

When we design fresh for today’s users, data and hardware, it’s clear there’s an opportunity to move several orders of magnitude beyond the RDBMS.

 

Technology analyst groups like IDC and Forrester Research have been tracking the emergence of these new technologies. For example, Forrester’s new report “Search + BI = Unified Information Access,” notes that not only is this convergence a marriage of features, but also that the underlying databases have evolved beyond the RDBMS for information access."2

 

The Impact of the Information Access Database

 

The people who benefit most from these new databases are people who don’t care much about architectures at all. For them, it doesn’t matter how it works so long as it works. Like the transformation to the business that accompanied the change from DOS to Windows, large new user populations can reach entirely new goals without any thought to what changed.

 

Because Stonebraker set the bar in orders of magnitude, the first movers are seeing the opportunity to gain competitive advantage measured in the hundreds of millions of dollars. Examples I’ve encountered include:

  • One of the largest professional services consultancies reported saving $500 million in a single year with a solution that gave them the visibility to efficiently staff tens of thousands of workers on projects.
  • A major automaker projects it will save more than $1 billion in direct materials spend by giving thousands of engineers better visibility into parts information spread across its enterprise systems
  • An e-commerce destination increased revenue by hundreds of millions of dollars by better helping its customers connect with its products and services

When workers ask for information visibility, IT no longer needs to ask them a curious string of questions as if they were Rip Van Winkle waking up in the 1970s. Instead, IT can simply solve their problem. 

 

References:

  1. Michael Stonebraker, C. Bear, U. Centintemel, M. Cherniak, T. Ge, N. Hachem, S. Harizopoulos, J. Lifter, J. Rogers and S. Zdonik. “ ‘One size fits all:’ an idea whose time has come and gone.” Proc. CIDR, 2007.
  2. Boris Evelson, Matthew Brown, Erica Driver and Norman Nicolson. “Search + BI = Unified Information Access, Combining Unstructured and Structured Info Delivers Business Insight.” Forrester Research, 2008.

 

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access