Every complex system needs a way to prevent failures or mistakes. It's one reason why cars have brakes and electrical systems have circuit breakers.
And it's one reason why Carnegie Mellon University has landed a $2 million to $2.5 million grant to develop ways to overcome failures in new-age supercomputers that are used to create models and simulations for scientific research.
To accomplish that, the U.S. Department of Energy has created the Petascale Data Storage Institute and awarded $11 million in grants to researchers at Carnegie Mellon, the University of California at Santa Cruz and the University of Michigan.
Garth Gibson, associate professor of computer science and electrical and computer engineering at Carnegie Mellon, will lead the institute. "This is a demonstration of the leadership of the Pittsburgh area in information or data-storage technology," Dr. Gibson said.
The institute also will involve researchers at DOE's Los Alamos, Sandia, Oak Ridge, Lawrence Berkeley, and Pacific Northwest national laboratories.
In coming years, supercomputers will use as many as a million processors to create "petaflop" processing speeds. Petaflop means the computer can do at least a quadrillion calculations per second.
One of Carnegie Mellon's goals in the five-year grant period will be to develop ways to manage torrents of data generated by petascale supercomputers, Dr. Gibson said. More sophisticated methods of data storage are needed to create methods of overcoming supercomputer failures.
Current methods of preventing failures involve taking periodic snapshots of data and storing them. If a failure occurs, the supercomputer can return to the most recent data snapshot in storage, rather than start over.
As computers get faster, the need for more sophisticated failure-tolerance strategies is necessary, Dr. Gibson said. Existing supercomputers that do trillions of calculations per second fail once or twice a day. But petascale supercomputers could fail once every few minutes, researchers said.
The institute's ultimate goal is to allow scientists to use supercomputers to create models of global warming, earthquake motions, fuel-efficient engines, nuclear fusion and the global spread of disease in ever higher levels of detail. "We need to do simulations of whole Earth climate that take the computer months to complete," Dr. Gibson said. "We can't be stopped by failures."
