Free Site Registration

Important Considerations for an International Customer Data

Information Management Magazine, July 2004

William Laurent

Currently, most companies with international lines of business, global offices or expatriated operations are embarking on international data warehouse projects, both broad and narrow in scope. Global data warehouse structures, with amalgamations of heterogeneous systems and databases on different platforms spread around the world, are now the norm. With Web-based data movement, mining and analysis, data boundaries have virtually dissolved. The world of data has become much smaller as many of earth's remote areas log onto the Internet for business intelligence purposes. Entities that have expanded into truly intercontinental businesses, with complex 24x7 globally aware data, run the gamut from international manufacturing conglomerates to financial firms to telecommunication providers. Unfortunately, many worldly warehouses and reporting repositories still have hurdles supporting quality global analysis, research, data consolidation, executive reporting and other types of data mining, whether approached from a core business, product or customer-oriented paradigm. The simultaneous distribution and publishing of data to autonomous and far-reaching locations is usually wrought with formidable difficulties. It is important to realize that problems are not only limited to bandwidth and language.

International data warehouse requirements can be extremely diverse. For instance, it must be decided early whether the data warehouse will be primarily for high level decision support reporting or detailed historical data mining and exploration (optimized for statistical or actuarial analysis and drill-through/drill-down inquiries). You must know fully which elements will drive warehouse reporting and data dissections - what the patterns of analysis will be. In other words: Will your global data warehouse or repository be geared toward customers, products, financials or other components? There is no off-the-shelf model, database or application that is 100 percent correlative with the warehousing objectives of international businesses and their data distribution needs. Global repositories will all have to be built so that they reflect how information and data is used in the company. An effective international data warehouse will need to reflect and reinforce the core values of the organization itself. Thus, it is a good idea to get a general understanding of some of the pitfalls, problems and possibilities of international data warehousing, before coming face to face with them at crunch-time.

Data Latency

An eye must always be kept on data latency issues; data is commonly created in one location and then synchronized or replicated to numerous locations throughout the world. The more geographically diverse the systems and resources, the more elaborate the complications. Quality and performance controls are a must when trying to keep data up to date and consistent across countless cities and countries. Although more persistent refresh and replication frequencies will shrink latency and waiting periods for data, greater network bandwidth will be used, requiring increased monitoring and performance-tuning tasks. Rugged scheduling logic and checkpoints will be required in order for a round-the-world user base to receive measures and dimensions that are consistent across their organization's divisions.

Advertisement

The Customer

An efficacious universal warehouse will bequeath to every global office an iterative feedback loop that tracks the actions, trends and whims of a company's foreign and local customers. Be it billing, shipping, return authorizations, marketing or other segments - all information from day-to-day business operations will relate back to the customer.

People behave incongruously (eating habits, hygiene standards, commuting trends, banking preferences, etc.) throughout the world. However, the international data warehouse should have data elements that are common throughout global locations - ones that track the same granularity, habits, behavior and components of customers. In other words, all behavior should be tracked. This is important in order to effectively spot customer trends and differences per localities. Only cross-country reciprocity and parallel congruency of data will give you a true picture of an entire customer base, helping you create strategies for targeted marketing pushes, speeding discovery of cross-selling opportunities, and boosting the conquering of untapped markets. Today everything is intra-country, from airlines/vacation travel to online dating to MP3 downloads. With an integrated cross-country viewpoint, your organization will start to understand why customers behave the way they do.

If you want to capture true global demographic trends and conduct serious business intelligence (BI), avoid making the mistake of having one warehouse per country or continental region (stovepipe). The goal is to have integrated data from around the world. Robust product lines will always straddle two or more continents, time zones (see Time Zone Issues section), currencies, regulations, etc. For example, a single ocean cruise excursion may encapsulate all of these characteristics in a single day's journey! It is vital to the spirit and architecture of the international data warehouse that shared global data is channeled into a primary repository. From here, all interested parties can be methodically provided with valuable data (via data marts aggregated along country lines and so on) for everything from high-level analysis to customer calls.

Time Zone Issues

International data warehouses require thorough and carefully planned time zone management because most enterprises span multiple zones. As data is synchronized, scrubbed, transformed, distributed and shared, data elements will invariably get out of phase with respect to time. As physical distances increase, problems with real-time and batch synchronization can increase exponentially, meaning that time zone problems need to be addressed in distribution schedules, data models, data storage and replication/integration plans.

Time stamping strategies are often the best methods to use in order to overcome problems of time zone processing differences, lending a helping hand with tracking when a transaction occurred or became valid/invalid. It is not uncommon to use three or more time stamps in order to track data movement from the main repository back to the source systems of record. For contemplation, consider the following time stamp fields that may occur in a warehouse fact table that models global transactions: add_timestamp will capture the local time zone date/time that the transaction was added to the main warehouse; batch_timestamp will capture the time zone date/time that the batch that loads the main warehouse started; and source_add_timestamp will contain the time zone date/time that the transaction of record took place in the source system. This sort of approach can be extended, scaled up or scaled down. Most financial measures should have multiple time stamps, or multiple surrogate foreign or primary keys that connect back to a verbose DATE/TIME dimension table. This table will track many things beyond simple date and time constructs. Holidays, Julian dates, days of the week and financial quarters can all be included in the mix. You want to avoid complicated SQL commands when navigating through layers of time, implementing most of this time stamping and date calculation logic during ETL extracts, not during end-user queries.

Page 1 of 3.

Advertisement

Advertisement