The term business intelligence appliance is defined differently by many folks, however, broadly accepted definition is that it is server hardware and database software bundled to specifically meet data warehousing needs.
Just like the name suggests, BI appliances are turning out to have some similarities with other familiar appliances, i.e., their kitchen counterparts. They come in many variations, so it takes some amount of groundwork and research to make sure that you choose the right appliance that meets your needs.
I was involved in two such bake-offs over the last year, and through the exercise, we worked out a pretty good process that helped us assess our options and make our choice. I will note some of the key highlights in our approach.
First, the client was a large financial institution, and this initiative was at a departmental level. Our key objective was to consolidate four data marts into a single infrastructure because we were getting many requests to combine information across these and eliminate data latency between them. We were primarily facing challenges on the data load side, which in some cases was greater than 30 hours for a run. Performance pain on the query side but wasnt very significant. Also as a secondary benefit, we were attempting to consolidate database licenses and servers across these four environments, including development and test boxes for each environment. Like many of the large enterprises, we had the server support outsourced to one of the large infrastructure support players and were being charged a hefty sum per server on a monthly basis.
Because the combined total data size was expected to be around 2TB, we were reluctant to even begin the process as we had heard about the 5TB plus starting point for the appliance solutions to prove valuable. However, we really needed something to help the load situation and decided to move ahead with this initiative.
We started with the usual product evaluation approach and broke it out into four key steps:
- Long list based on Internet research.
- Short list based on discussions with analyst firms and minimal interactions with the vendors.
- Proof of concept bake-off for the short list contenders.
- Final assessment and decision.
Step 1: Long List
For the long list of candidates we got most of the players from Gartners Magic Quadrant. We found that we could broadly classify these into a few categories:
- Hardware and software solution,
- Software solution or
- Hardware solution.
Step 2: Short List
We trimmed down the initial list based on our size and performance needs. During this process we used input from Gartner and Forrester and had minimal interaction with the actual vendor sales reps. We did consider customer references as part of the decision process.
So, that led us to two players, as our main contenders for our POC bake-off. Interestingly, it turned out to be a mix of a proprietary hardware plus software player and a commodity hardware plus software player. Although one vendor appeared to be a startup player, they were given high marks by the analysts and, more importantly they seemed to be working very closely with Sun and even had common board members, making their viability question a little less risky.
Step 3: POC Bake-Off
We put together a set clear and transparent of guidelines for this process to ensure that we had an even playing field. Some of the things we laid out were:
- We insisted that the POC be done on site.
- We would have one of our team members shadow the vendor engineer during the entire process to understand and report back on what it took.
- We also time-bound this to be a one week on-site activity.
We knew that the on-site requirement would mean getting a bunch of approvals internally to allow for the vendor hardware to be set up in our data center, so we initiated that process during step one. By the time we had the final contenders, we had things in place from a legal/security perspective to not hold. For the POC task, we identified one load process that was taking approximately 33 hours as a prime candidate. This process consisted of a set of Informatica jobs and involved picking up data from flat files and moving it to stage, to final schema and finally to a set of aggregate tables. Since we had existing investments in reporting and analytical applications, we had to ensure that the existing schema remain untouched so as to avoid changes to these BI applications.
The load process consisted of all the three types of operations, i.e., inserts, deletes and updates. We were going to measure the load performance of this entire task, and then we would have one of our BI environments point to the vendor appliance and benchmark running some of the long-running reports and queries.
Both the vendors shipped over their boxes to our data center, and for logistic reasons, it turned out that we had the vendors come in and work on their tasks on staggered weeks. So we had vendor one on week 1 and vendor 2 on week three. We would have preferred this to be the same week, but that would have required some additional setup on our side.
In both runs, we ran into some technical snags in moving the raw data over to the appliance, but we used one of the big USB devices to move it over. The runs were fairly smooth and the engineers were very good. They knew what they were doing and were able to carry out the tasks with minimal issues. The performance runs of the whole process in both the cases yielded mind-blowing results. The 33-hour process took less than 40 minutes in both the cases. We also tested both using mixed loads, i.e., loading data and running reports simultaneously, and we did not see much of a degradation. These systems have been architected to allow for loads without impacting the end usage. Of course, this can also lead to some read consistency issues if the overall process is not designed properly, but thats a separate discussion.
Both of these appliances had proven themselves with good performance, with very little difference (less than four minutes) between them. So, the decision process now switched from performance to price/performance, price here being total cost of ownership over a three-year period, including accounting for the projected data growth.
This particular exercise we did in late 2008, and the economy was such that it was already a buyers market. Both vendors were willing to bend over backward to close the deal. So, even the price/performance was becoming a difficult metric to base the decision on.
We finally made our decision to go with a commodity hardware plus software solution rather than the proprietary hardware plus software solution, our main reasoning behind being that the commodity hardware based system would stand to gain from the billions of dollars of R&D being carried out by the hardware giants like Intel and AMD. This meant that we would have to go with a relatively unknown player, but the strong backing mitigated the risk.
The BI appliance industry is evolving rapidly, and new players are really pushing hard to make their solutions affordable to even the sub-billion-dollar enterprises. I recommend that even enterprises having to deal with just a TB of data (with expected data volume growth) should explore the possibility of introducing a BI appliance into their ecosystem. I can almost assure you that you will be surprised by the performance gains from these systems on both the data load side as well as the query side. Finally, I believe this market is going to evolve rapidly and the commodity hardware plus software players will begin to dominate the marketplace.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access