Continue in 2 seconds

Designing a Technology Infrastructure for Analytics

Published
  • April 01 2005, 1:00am EST
More in

Overview: This article provides an overview of the four major technology selection decisions that need to be made by technology managers and directors who are responsible for designing a technology infrastructure that will support the analytical processing needs of the business.

The use of data analytics is expanding rapidly as companies attempt to leverage huge volumes of data in their operational systems and data warehouses to attain greater profits. To ensure that you are able to use data analytics effectively and efficiently, it is critical to design the right technology infrastructure. To a large extent, the speed at which queries and models are run, the ease by which analysts are able to use their analytical tools and how quickly the infrastructure can be customized for users depends on technology infrastructure.

This article provides an overview of the four major technology selection decisions that need to be made by technology managers and directors who are responsible for designing a technology infrastructure that will support the analytical processing needs of the business. This article will also help business managers understand some of the complexities and challenges in designing and building an efficient analytical processing infrastructure.

A business may require a separate infrastructure to meet data analytics needs because the requirements are very different from those of transaction processing. Most companies have made significant technology investments in the latter and have optimized their technology infrastructure for that sole purpose.

Making technology selection decisions is an iterative process that is performed jointly by business users and technology professionals. Your users may tell you that they need 100% uptime and high performance for all their applications. However, the reality of delivering such a technology platform may be much more expensive than what users anticipated and, more importantly, what they or your finance organization can afford to pay. You will need to provide a balance between what users say they require and what they truly need. Because you will have a broader knowledge of technology and cost drivers, you should be able to guide users and define the business requirements. This process often requires two to three iterations.

Business Requirements

As a first step, you should capture the business requirements. You need to gain a good understanding of the functionality required by the users of the system. In an analytic environment, your users would typically need to load, merge, clean, profile and segment data in multiple ways and then create predictive models employing the data. You need to decide who are the likely users of this infrastructure. Are they business users who prefer to have GUI-based applications to guide their analysis? Are they technical users with knowledge of SQL and other programming languages? The size of typical data sets today and an estimate of how they are likely to grow over the next three to five years is critical information to capture. You also need to consider business policies regarding data privacy and security.

Given that designing and implementing the right infrastructure for analytics is often a significant investment of time and money, you should plan on designing an infrastructure that will be adequate to meet needs for three to five years before major, additional upgrades are necessary.

Architecture

Based on the business requirements, you will need to create the architecture for your infrastructure. The architecture will require you to make four major technology selection decisions - application(s), database, operating system and storage. Note that there are dependencies between all four of these decisions; therefore, you should not plan to make a decision on each in a sequential manner, but rather start with a broad universe of options for all four components and narrow them down through an iterative process to the final solution set.

Application

The analytic tool or suite of tools will be the primary interface between the users and the analytical environment. The application(s) should satisfy all critical functional needs of the users. At the same time, it should provide adequate performance, scalability and ease of use.

Other criteria you should consider are the application's availability on multiple platforms (such as Windows and UNIX) and the skill set to support it. Finally, you should make sure that the licensing costs are within your budget. You may decide to use one tool, such as SAS, for all your needs or select multiple tools to provide users with the most optimal tool for specific functions. However, you need to take into account the licensing and support costs for each additional application or tool and weigh that against the added benefit of providing specialized tools for specific functions. Our experience at Inductis indicates that specialized tools such as CART, MARS and TreeNet from Salford Systems provide a tremendous benefit to building predictive models with a high degree of accuracy.

Database

Selecting the right database is very important because data storage can make a huge difference in performance and ease of use. In some cases, you may choose to use flat files (e.g., with SAS). You can choose a relational database such as Oracle, IBM DB2 or Microsoft SQL server, a fast and specialized analytical database such as QueraBase (from Enquera) or even a specialized system such as Teradata. You should expect to achieve higher performance as you use more specialized databases (e.g., benchmarking with Oracle and QueraBase has provided us, in some cases, with a three to five times improvement in performance of commonly used analytical functions).

While database selection can often be dictated by enterprise-wide standards, you should weigh the benefits of using a more specialized database against the additional cost before making the decision rather than blindly following the company standard.

Operating System

In many cases, the set of options for the operating system (OS) will be dictated by your IT department. For example, your company may support only Windows and HP-UNIX. You need to make sure that the application(s) and the database that you have selected are supported by the chosen OS platform. You need to decide whether you want a single OS environment (e.g., all-Windows or all-UNIX) or a mixed environment (e.g., users with Windows desktops connected to UNIX servers).

As with database selection, you will need to consider the end use of the infrastructure. Will it be used by hundreds of users submitting ad hoc queries? Or is it a more structured load? You will also need to consider the level of user familiarity with the chosen OS as well as the cost of application and database licensing on the selected OS because these can vary significantly. As a general rule of thumb, you should select Windows if you want to optimize for ease of use (as often demanded by business users) but seriously consider UNIX if uptime is your primary concern and you want to be able to scale to hundreds of concurrent users.

Storage System

Last, but not least, is the type of storage system you want to use for your analytical technology infrastructure - there are several options available. The cost of raw storage ranges from $1 per gigabyte to approximately $25 per gigabyte. For example, if you choose to provide users with individual storage units, you may be able to provide them with inexpensive 500GB USB drives. However, sharing common data and providing backup, retrieval and security will be very cumbersome and expensive. Alternatively, you may choose to go with a storage area network (SAN) which costs approximately $20 to $25 per gigabyte in addition to infrastructure setup costs. The leader in this space is EMC; however, products from EMC tend to be more expensive than products from emerging companies such as 3PAR. In the middle of the spectrum, there is a whole range of technologies - direct attached storage (DAS), network attached storage (NAS) and iSCSI - all of which offer a different level of trade-off between cost and performance.

An important component of the storage system is the backup technology. Decisions need to be made about how, and at what frequency, backups will occur. These decisions will be driven by the business requirement of how much data the business can afford to lose as well as how quickly it needs to be recovered. Common methods for designing a backup system are disk-to-disk and disk-to-tape.

Careful Planning Required

This article has focused on how to approach the four major technology selection decisions that support an analytics technology infrastructure. As you may have realized, this is a complex and time-consuming exercise requiring significant technical expertise across multiple dimensions. Before you dive into this, you may want to evaluate whether the in-house staff has the expertise to make the necessary decisions or whether it would be more effective to bring in outside consultants. In both cases, incorporating the business users' perspective is critical to ensure that the environment is designed with their specific needs in mind. 

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access