Robin Bloor, Chief analyst and president, The Bloor Group, founder, Bloor Research
You always have your ear to the ground, what's got you interested right now?
I've been doing a fair amount of work on parallelism recently, which I think is kind of interesting because the data industry is going to have to deal with a real fundamental problem there.
You mean in terms of processing data?
Yes. It was around the year 2005 when chip manufacturers came to a point where they couldn't really scale up the chips anymore simply by increasing the clock speed. The films of the silicon surface had gotten so thin that pretty much all the voltage was being released as heat, so much so that it wasn't possible to run anything faster than about 4 gigahertz. The effect of that was that the silicon boys had to find another way of increasing the power of chips or they would have lost their obsolescence model, which means their whole industry is ruined.
And then we got multi-core processors, right?
Right, they started to put multiple cores on chips, and even though clock speeds had hit a wall, with miniaturization, they were able to progress for a couple more generations. More cores on chips is great if you've got software that can use that. But right now, there's very little software that can make use of multi-core. All people really need to know is that chips now have multiple cores and that software needs to be written for multiple cores. If you've got software that can only run on a single core on a single server, why would you be buying multi-core chips to run it? That would be as inefficient as adding another server for every application, when you should be trying to use the power that's there.
What about databases?
Pretty much all the large databases were written for parallel operation. Databases can have a lot of concurrent users where the issue is more breadth than speed, and that's one reason they're good candidates for parallelism. That parallelism was aimed at SMP, symmetric multiprocessors, which was often a cluster of boxes. The difference between a cluster and a multi-core CPU is that you have a fiber between the memories of the various machines in the cluster, whereas you don't need the fiber in multi-core. Oracle and DB2 were written for parallel operation in the 1990s and, therefore, they can use the stuff. So this doesn't make a big difference to many people already on SMP, though over time they could move the databases they've got onto multi-core chips. That's a kind of upgrade, but we're not talking about new applications. The opportunity in applications is in ETL, it's in data cleansing and data integration areas where multi-core parallel operation needs to come into play and hasn't yet.
What's a simple way for us to understand or explain the difference between serial and parallel processing?
Most people know how to write a set of instructions, and that's all programming really is. And most people are well aware of how to serially express something like doing the laundry. You gather the wash, separate the whites from the colors, put the whites in, set the machine, add the soap, turn it on. You can write that just like a program. Now try writing that in parallel. Most people wouldn't have a clue where to start. To do the wash in parallel, you'd start out with many washing machines and many dryers, and your parallel operation would be to take an item and stick it in one machine, take another and stick it in another machine. It doesn't make sense all the time, but in computing it makes tons of sense to split the kind of common work that makes everything go a lot faster. But most people simply don't know how to write the instructions.
Isn't virtualization supposed to sort tasks with a sort of core resource that's not specific to a piece of software?
First of all, virtualization didn't need parallelization. The original virtualization by IBM was so long ago that a lot of us were still in short pants. But virtualization of operating systems was done because of the richness of power in a given CPU chip, not multi-core chips. It actually overdelivered for all the time they were doubling the power of chips every 18 months. Windows NT and Linux came along, and people were buying commodity servers, one application per server basically. You don't want all the work of monitoring how resources should be shared between one application and another, so you just put in a new server with every application. But people came to see in data centers that the CPU efficiency for servers was like 6 to 10 percent, and this was on single core. So you can put virtual machines in place and now you've got lots of space without even thinking about multiple cores.
Right, and that's a good thing.
But with every virtual machine, you're putting in a new operating system, and that footprint means a lot of management problems. If you've got four virtual machines sharing a single CPU or a single server, you can have peak loads on all of them that will knock each other out. Also if that server fails, you have four recoveries to make rather than one. So the management for virtual machines is not trivial. In information management, it's mostly for low-hanging fruit like development and testing and some other things, but once you get to about 30 percent of applications, you start to run out of things that don't have too many dependencies to manage.