Continue in 2 seconds

Transform Early and Often

  • February 16 2006, 1:00am EST

We're all familiar with the ETL acronym - extract, transform and load - which has been around since business intelligence (BI) as a discipline was invented. Yet the underlying assumption - that BI systems must generate their refined data from previous records in operational systems - is a product of a 10-year-old worldview.

In the mid-90s, the only place data turned up was in operational systems, so it made a lot of sense to distill information from enterprise resource planning (ERP) and customer relationship management (CRM) systems. But advances in raw processing power are starting to call this bedrock assumption into question. Why not just write the BI-friendly format immediately and be done with it? A number of vendors are starting to do just that. Maybe not enough to call it a wave - but certainly enough to call it an interesting ripple.

From Collating Snapshots to Monitoring Streams

This change in BI architecture is occurring due to a general trend I've talked about in the past - the shift from snapshots to streams. Because it was too difficult to do otherwise, companies made do with snapshots: surveying their customer base once per year or writing a customer record only when a really significant operational event occurred, such as the customer buying a product. An enterprise didn't track a customer's offhand comment to a sales rep because it was too much work.

However, today, because everything is now digital, companies can track a whole stream of events - employing a point of sale (POS) system to log what products a customer purchased or using a Web analytics system to monitor the Web pages a customer traversed at the company's Web site. These details are no longer a series of isolated snapshots of behavior, but rather a steady stream of information. With processing power increasing and storage costs dropping, vendors are starting to say, "Why analyze a huge set of snapshots, when we can monitor the stream in real time as it goes by?" This fundamental shift in viewpoint is giving birth to a variety of specialized network appliances that do exactly that. Perhaps a few examples will clarify what I mean. I've omitted the vendor names to drive home the point that the important takeaway is the architectural principle, not who's selling what.

Two Examples

One solution monitors visitor behavior on a Web site. It watches the back-and-forth HTTP stream via a sniffer installed on a network port or at the Web server, using a set of rules to rapidly convert the huge data stream into tiny "signatures." Using a process of semantic compaction, the software can compress a pattern of 40 file downloads - signifying that the user looked at two Web pages, for example - into a short code meaning "Closing an Account - Step 1." These signatures track both customer actions (e.g., placed item in shopping cart) and intent (e.g., spent more than one minute reading the product page).

By storing the compact signatures rather than the raw data, the software is incredibly space efficient. For example, it can store the results of 100 million sessions within only 500GB of disk space.

A second example is a solution that can replay an online user's actions. By storing the data that streamed by the network tap, it can recreate a user session whenever the business asks it to. For example, business analysts can 1) replay a user's online session or 2) watch multiple sessions at a critical process step, such as a Search results returned page.

These capabilities help an enterprise understand the user actions behind metrics such as "Shopping Cart Abandonment: 32 Percent," and "Fraud: 8 Percent." Analysts can put themselves in the user's shoes, and figure out how to make the site easier for loyal customers to navigate - and harder for fraudsters to fool.

Because these applications capture customer behavior in real time, they aren't the classic, "wait-an-hour, wait-a-week, wait-a-month for insight" BI solutions that we're used to. If it wishes to, an enterprise can analyze visitor behavior seconds after it has occurred - an ability that is crucial if the business is focused on preventing fraud, for example.

In Short, Instant Insight by Storing Only the Essentials

What we're witnessing is that systems themselves - not just humans - are starting to suffer from a case of information overload. By screening out extraneous data and storing only the salient points, these specialized appliances are tightly compressing the extraction and transformation steps, leading to virtually instant insight for the corporation. In this "gotta have the info now" world, this stream-oriented analysis capability is not a bad thing.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access