JUN 2, 2009 4:35pm ET

Related Links

CIOs Not Always Engaged on Cloud, Analytics, Social
May 21, 2012
Enterprise Collaboration Coming Together
May 18, 2012
SAP Visualizes Next Steps with Analytics, HANA
May 16, 2012

Web Seminars

Data Discovery for Big Insights
Available On Demand
How to Narrow the IT/Business Communication Gap
Available On Demand
Suit Yourself: An Effective Recipe for Self-Service Analytics
Available On Demand

Choosing a BI Appliance

Print
Reprints
Email

The term business intelligence appliance is defined differently by many folks, however, broadly accepted definition is that it is server hardware and database software bundled to specifically meet data warehousing needs. 

Just like the name suggests, BI appliances are turning out to have some similarities with other familiar appliances, i.e., their kitchen counterparts. They come in many variations, so it takes some amount of groundwork and research to make sure that you choose the right appliance that meets your needs.

I was involved in two such bake-offs over the last year, and through the exercise, we worked out a pretty good process that helped us assess our options and make our choice. I will note some of the key highlights in our approach.

First, the client was a large financial institution, and this initiative was at a departmental level. Our key objective was to consolidate four data marts into a single infrastructure because we were getting many requests to combine information across these and eliminate data latency between them. We were primarily facing challenges on the data load side, which in some cases was greater than 30 hours for a run. Performance pain on the query side but wasn’t very significant. Also as a secondary benefit, we were attempting to consolidate database licenses and servers across these four environments, including development and test boxes for each environment. Like many of the large enterprises, we had the server support outsourced to one of the large infrastructure support players and were being charged a hefty sum per server on a monthly basis. 

Because the combined total data size was expected to be around 2TB, we were reluctant to even begin the process as we had heard about the 5TB plus starting point for the appliance solutions to prove valuable. However, we really needed something to help the load situation and decided to move ahead with this initiative.

We started with the usual “product evaluation” approach and broke it out into four key steps:
  1. Long list – based on Internet research.
  2. Short list – based on discussions with analyst firms and minimal interactions with the vendors.
  3. Proof of concept bake-off for the short list contenders.
  4. Final assessment and decision.

Step 1: Long List

For the long list of candidates we got most of the players from Gartner’s Magic Quadrant. We found that we could broadly classify these into a few categories:
  1. Hardware and software solution,
  2. Software solution or
  3. Hardware solution.

Step 2: Short List

We trimmed down the initial list based on our size and performance needs. During this process we used input from Gartner and Forrester and had minimal interaction with the actual vendor sales reps. We did consider customer references as part of the decision process.

So, that led us to two players, as our main contenders for our POC bake-off. Interestingly, it turned out to be a mix of a “proprietary hardware plus software” player and a “commodity hardware plus software” player. Although one vendor appeared to be a startup player, they were given high marks by the analysts and, more importantly they seemed to be working very closely with Sun and even had common board members, making their viability question a little less risky.

Step 3: POC Bake-Off

We put together a set clear and transparent of guidelines for this process to ensure that we had an even playing field. Some of the things we laid out were: 
  • We insisted that the POC be done on site. 
  • We would have one of our team members shadow the vendor engineer during the entire process to understand and report back on what it took. 
  • We also time-bound this to be a one week on-site activity.
We knew that the on-site requirement would mean getting a bunch of approvals internally to allow for the vendor hardware to be set up in our data center, so we initiated that process during step one. By the time we had the final contenders, we had things in place from a legal/security perspective to not hold. 

For the POC task, we identified one load process that was taking approximately 33 hours as a prime candidate. This process consisted of a set of Informatica jobs and involved picking up data from flat files and moving it to stage, to final schema and finally to a set of aggregate tables. Since we had existing investments in reporting and analytical applications, we had to ensure that the existing schema remain untouched so as to avoid changes to these BI applications.

The load process consisted of all the three types of operations, i.e., inserts, deletes and updates. We were going to measure the load performance of this entire task, and then we would have one of our BI environments point to the vendor appliance and benchmark running some of the long-running reports and queries.

Both the vendors shipped over their boxes to our data center, and for logistic reasons, it turned out that we had the vendors come in and work on their tasks on staggered weeks. So we had vendor one on week 1 and vendor 2 on week three. We would have preferred this to be the same week, but that would have required some additional setup on our side. 

In both runs, we ran into some technical snags in moving the raw data over to the appliance, but we used one of the big USB devices to move it over. The runs were fairly smooth and the engineers were very good. They knew what they were doing and were able to carry out the tasks with minimal issues. The performance runs of the whole process in both the cases yielded mind-blowing results. The 33-hour process took less than 40 minutes in both the cases. We also tested both using mixed loads, i.e., loading data and running reports simultaneously, and we did not see much of a degradation. These systems have been architected to allow for loads without impacting the end usage. Of course, this can also lead to some read consistency issues if the overall process is not designed properly, but that’s a separate discussion.

Advertisement

Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.