Bill Kramer is the deputy director of the National Science Foundation’s Blue Waters supercomputer project at the University of Illinois’ National Center for Supercomputing Applications. In that role, the veteran of 20 supercomputer implementations has been part of an administrative team that expects to take the world of computing another leap forward with its one-petascale machine, Blue Waters. Ahead of Monday’s “Petascale Day” at the university, Kramer and Information-Management.com discussed the real-world challenges of implementing huge data systems, the ground-breaking medical and astronomical research Blue Waters could produce, and the ripple effect the supercomputer will have on parallel processing and bandwidth constraints.
Information-Management.com: Bring me up to speed on where Blue Waters stands and where it’ll be over the next few months.
Kramer: Since November, we have almost completed everything that was laid out. All the equipment is delivered and connected and operating. We had about three months of what we called an early science program, running a subset of the machine with its full expected level of production quality, and we had 15 of our 26 science teams on the machine. Those accomplishments are being published now. And that small system, from our point of view, was actually larger than anything the National Science Foundation has ever had in its repository of computing. It was about 18 percent of the compute power and 5 percent of the storage bandwidth and capacity of the final system. With reliability and performance testing that might run into … the early part of 2013, we’re doing pretty well with staying on a more aggressive schedule of five years of sustained use for the national science community.
Sometimes with these supercomputers, there is that braggadocio over data speeds and processing. The focus here really does seem to be on the output: the research, the methodology, the business uses. What are some of those real-world applications you expect to come from Blue Waters in the near term?
We have architected a system to operate the widest variety of possible uses. It is one of the fastest computational systems if you count peak flops, but it’s also one of the largest amounts of memory of any system in one place, the most data and storage capabilities, if you count the amount of capacity we have. And the amount of bandwidth is not surpassed by any other system in the world, that we know of. From all aspects, it can probably deal with any problem that anyone poses as well as any system in its class. We have 32 science teams already assigned to the system, investigating, say, ‘Does a virus penetrate a cell and inject its genetic material?’ … to highly accurate severe weather casting. We also have a team looking at what we call systems of systems, namely a system like satellites that monitor a function on Earth, how do you optimize that system in the case of failure, which could be applied to many methods of business systems … These are all at scales that could not have been done before, but they’re also always designed with problem solving in mind.
From our vantage point, we’re hearing almost non-stop about “big data” in relation to enterprise systems and information on anything from bank transactions to Facebook discussions. What you’re dealing with in Blue Waters and at other supercomputers would be, in comparison, huge data. But, given the advances in computing power and expectations with data, do you see a ripple through to the enterprise and business level?
We’re at a scale that is unprecedented and we have to learn with failure, inconsistent performance, how to run things in parallel. Like any system, one slow unit out of one thousand will slow down everybody. So we’re also looking at how to detect those problems and improving methodologies for accessing data. As systems improve, even smaller systems 10 years from now or to some degree 5 years from now, one that would be much more affordable for a business or a region would have the same amount of data we’re dealing with today. The methods that we develop and our vendor partners develop to deal with highly parallel things will be commonplace not too long from now. One big aspect to all of that is that bandwidth is the thing that’s becoming the most limited and the most expensive. You can build more and more processors, but what’s really limiting their ability to do sustained work is the bandwidth, which gets the data in and out of the systems. And we’ll be doing some significant educational programs on that front, not only with our systems, but how to do it well in ways that are applicable for other machines. Five percent of our systems are also devoted to our industrial partners, ranging from validation of parallel methods to … the largest possible aircraft and its mechanical systems.
There’s an educational component to all of this as well. Tell me about the drive to pass on unique learning experiences for those people involved with Blue Waters in data science efforts with mining, analytics and big data.
We have targets for graduates, for faculty to develop course-ware with them, for undergraduates to get involved in internships … there are also workshops, seminars. We bring students and faculty together to collaborate on this course-ware. We just finished our third year of doing a virtual school for computational science and engineering. That uses distributed teaching to up to 16 institutions, teaching hundreds of people at the graduate level on data sciences and petascale computing mechanisms. And it’s a hands-on workshop through labs over high-definition televisions broadcast to different institutions. After the system is in full service, 1 percent of Blue Waters is dedicated to classes, training, workshops. And 1 percent, on aggregate, is as much computing power as NCSA has had over 25 years.
Now, in getting Blue Waters in place, there was a notable changing of the guard last year when IBM bowed out and Cray was chosen as a replacement. Your system is unique, yes, but what lessons did you take away from dealing with a drastic change in data systems planning midstream?
There is the term “co-design” for a set of applications and a system designed in parallel with cross-talk between the two so they’re changing how they do business. When you do something that far in advance, there are challenges to overcome. One of the things I learned is that it’s probably unwise to try to serve too many people in the co-design process. You want something that you produce to span across many areas, but if you have too many people putting too many obligations that aren’t shared with a design, it becomes cumbersome or unfeasible. And that’s where we got to, due to issues of cost and technology and time-to-market.
(Editor's Note: The Great Lakes Consortium for Petascale Computation is a group of more than two-dozen institutions that have supported the infrastructure and software that makes up Blue Waters. Through Oct. 31, the group is accepting proposals to use the supercomputer for work on huge scaling applications. Click here for more.)