Continue in 2 seconds

Generating Data Marts On-Demand

  • October 01 2002, 1:00am EDT

Data marts have become essential tools in today's business environment – repositories of data gathered from many different sources which analyze and then present information to a particular group of users in terms relevant to their work. Take a manager who has just purchased some external data to help with an important marketing campaign. For that data to be meaningful, it first needs to be merged with the company's internal data. From that data pool, the company will then have what is needed to drive the campaign.

The typical solution today is to extract internal data and store it in SAS files along with the external data. While this solution is well accepted, SAS files are more difficult to manipulate than data in a relational database. Another limitation is that relational databases require skills (logical database design, physical database design, physical database generation and loading) not normally possessed by SAS programmers. Recruiting a database administrator to provide these services may introduce a several-month delay.

What if, instead, the marketing manager could combine the internal and external data and immediately populate a data mart that is automatically defined simply by the act of requesting it? A pipe dream? Not at all. Thanks to today's innovative uses of meta data, companies have the potential for real-time generation of data marts to meet specific, short-term needs of individuals who need the right combination of data for important business needs.

Three technologies are needed to generate on-demand data marts:

  • An accessible meta data repository that users can query.
  • An extract tool that can get to the data on demand.
  • A target DBMS that can automatically generate a database.

The starting point for on-demand data marts is an accessible meta data repository that reveals what data is available. The repository also needs to assist the user in the selection of individual data elements needed to support a business need, such as the marketing campaign I've already mentioned. Users need to be able to pull multiple data item collections as part of the same activity – for example, multiple items that define customer, product or sales. In logical database terms, this should translate into logical tables that reflect the collections of data items as defined by the user.
Where can such meta data repositories be found? Several vendors have products that provide this functionality, including Ab Initio and Ascential.

The extract tool should provide access to data in a variety of data sources. Data is typically stored in several places: a data warehouse, operational data stores, operational databases and flat files. The extract tool most likely will be a batch extractor, with specified extraction schedules. Some data stores can be extracted in near-real time, such as a data warehouse. Others have to wait so that key business functions are not disturbed. Either way, the extraction should be scheduled automatically, with clear communication to the user of the time schedule. The source should be relatively transparent to the user, while ensuring that the meta data descriptions provide enough context for the user to know that the correct source is being tapped.

What extract tools exist? Not surprisingly, because such tools must be tightly linked to the meta data repository I just described, Ab Initio and Ascential are also examples of extract tool vendors.

The target database has been a major stumbling block for designers in the past. To provide easier data analysis, this database should be relational. However, substantial technical expertise is required to design and create a relational database that has acceptable performance characteristics. The major technology advance in this area has been the introduction of databases that permit the effective creation of in-memory databases. In turn, this technology delivers analysis performance without the need for physical database design in the traditional sense. This provides a great solution for the creation of short-lived databases that are needed quickly, but only for limited periods of time.

The delivery of this newer technology means that companies now have the capability to create very focused collections of data to meet specific analysis needs for a short period of time, eliminating the need to continue using an existing database once the original need has changed. This is the heart of on-demand data marts.

What databases exist that meet this need? One such product is the Sand Analytic Server from Sand Technology. Used in combination with Ab Initio, Ascential or some other similar tool, current customers are generating data marts as needed and discarding them when the analysis need is complete.

Data warehouse managers may protest that analysis belongs on the data warehouse. In the broadest sense, I agree based on the work I have done at Accenture. However, there seems to be a growing consensus today that data warehouses and data marts serve two related but different purposes. In many situations today, business professionals have a short-term need for external data or customer data that must be combined with data warehouse data in order to be useful. However, the resulting collection should not then be stored on the data warehouse for various reasons. For these kinds of needs, on-demand data marts are an excellent solution.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access