How IBM's Watson Churns Analytics
(This story is a sidebar to our initial mainbar story IBM's Watson Challenge No Gimmick. -ed)
IBM's Jeopardy-playing Watson computing system is attracting attention for reasons beyond the main spectacle of a talking computer playing in real time and winning out over all time champions.
One angle is the back end of hardware, processing and software technology assembled over four years. IBM's Watson system is based on the company's POWER7 systems designed for analytical processing, a product line the company claims to have invested $3.2 billion in over the last four years.
The Watson system built for playing Jeopardy consists of 10 "refrigerator-sized" racks of POWER 750 servers running Linux exclusively in-memory, with 2,880 processor cores and 15 terabytes of RAM. The system is loaded with the equivalent of 200 million text pages of information. No external systems are connected to Watson for game play, leaving the loaded memory capable of operating at 80 teraflops, or 80 trillion operations per second.
Each of the cores can sort through the 15 terabytes of RAM independently with a bandwidth of 500 gigabytes per second, according to IBM.
IBM and other manufacturers have grown their expertise in building new generations of high performance with emphasis on the task at hand, what the computing industry calls workload optimization. Playing Jeopardy requires a kind of performance that is very adept at an instant kind of recall, but would also be unsuited for other uses, like processing and storing digital imagery for a major Hollywood movie.
"For workload optimized systems, general purpose software on general purpose hardware certainly has its place," says Bernie Spang, IBM's director of strategy and product marketing for database software systems. "There's a difference between setting up a system for pure transaction processing versus deep analytics versus a mix. In our case it's not just going with a faster processor, but the whole bandwidth I/O for memory, storage, the whole architecture."
A mix of open source shared components play large roles in Watson's analytic proficiency, including Linux, Hadoop and the unstructured information management architecture (UIMA) that stores content "intelligently" based on how it is digested in the learning phase of the process.
Watson most certainly failed miserably at many trials (See here, scroll to 2:50) before almost innumerable details could be sorted through.
"This is triage," said David Ferucci, the lead Watson researcher at IBM. "There are a million things to consider here."
Charles King, a long time observer and principal analyst at Pund-IT, says we'll see a lot more purpose built, workload optimized systems and services in the near future. "If you want an answer in two minutes instead of two seconds you'll pay for a certain kind of hardware or service. If you're talking about a system you query at 11:00 p.m. for an answer the next morning and it costs $1.50, well then, wow, that's better than spending a half hour on Google."
(This story is a sidebar to our mainbar story IBM's Watson Challenge No Gimmick. -ed)