When you start to create a data warehousing solution these days, you have myriad options for delivering it. Determine your requirements by constructing a delivery plan with an end goal in sight, and build to that end goal. The end goal could be anything from giving a sales manager the knowledge of who her best saleswomen are, helping a buyer predict clothing patterns or helping a college dean decide the curriculum to offer for the coming year.
Once you have your end goal in sight, develop a sound technical architecture for the delivery. Technical architecting has come about through necessity. Once upon a computer time, we had no need for technical architecture planning because there were no interconnections to other computers, customers, suppliers, etc. Now, however, a computer is deemed nearly useless if it is not connected. Because there are so many different types of computers, there are terribly many options for connecting them. When you need to deliver a solution that draws from the strengths of many different types of computing, you have to put together an architecture plan that tightly integrates those systems.
Technical architecture is the collection of the components that makes up the network of your computers. It can include essentially any hardware; however, for the purposes of this article, we will focus on the servers as well as the disk arrays that connect to them for storage.
The key to delivering a sound data warehousing solution is selecting the right parts to fit together. Depending on the trade-offs you are making in your delivery, you will need to investigate many options. In data warehousing, the major trade-off we work with is storage versus processing. If you store pre- aggregated data, you save on the processing side. If you store nothing pre- aggregated, you pay for it at runtime.
The following examples illustrate the types of options you might have to investigate when creating a sound data warehousing solution.
Example One. A credit card company needs to understand who is using its cards fraudulently. The solution is to deliver a system that will identify behavior that is consistent with fraudulent use. Focus on what the processing will need to be and install a technical architecture to accomplish it. In this case, a neural network would be of great benefit because of its pattern-recognition capability. Neural networks employ data mining techniques creating self-organizing maps. They do so by running through a series of steps against a training set of data known to be associated with fraudulent use so that the neural network can "learn" to identify patterns of fraud. This type of processing needs a lot of computer power, not necessarily disk space, in order to deliver better value. Consequently, the technical architect will want to make his or her trade-offs on the side of adding more processing and memory to the computer and not worry so much about the disk space.
Example Two. A food-product company's bakery department wants to understand the breakdowns of its sales. Because the product is not seasonal, product sales occur in every region throughout the year and all of the customers buy varying quantities of all of the products. Consequently, there is a great deal of empirical data upon which to analyze sales. This situation calls for a solution that will allow browsing up and down many deep dimensions of data. This requires creating many pre-aggregated data stores and tying the front-end browser to the data for each dimension. This is best accomplished by building many, many aggregate tables for all the various paths a manager would want to traverse. The technical architecture to support this type of application would be one with a lot of fast disk, attached to the servers with fast fiber-optic connections. Then, that data should be read into vast amounts of memory because its navigation will be focused on those straightforward dimensions. This kind of analysis will not require huge amounts of processing power.
Example Three. A managed-care company wants to understand which of its drug protocols are most effective from both a clinical-efficacy standpoint and a financial-optimization standpoint. What happens in this case is somewhat similar to example two, but the sparseness of the aggregated data makes its behavior quite different. Where example two can benefit from a fairly regimented, multidimensional cube approach to the data, this company requires more frequent hits to a database because the data does not fit nicely into a cube. Consequently, the solution for this type of application requires processing power and a lot of input-output operations in the database. This type of application will call for a more complex data model than example two, one requiring more joins in the database which will require more processing power. The technical architecture to deliver this will require disk space, processing power and memory. As all cannot be delivered without cost, the application's downside will be slower result return times.
In these examples, we see some of the implications an application's end goal has on the technical architecture needed to support its delivery. Technical architecture planning is the process of working through the requirements of the system to be delivered and selecting the technical components necessary for the solution.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access