In November of 2012, the semiannual Top500 rankings of the world’s supercomputers gave top billing to a machine constructed at the Oak Ridge National Laboratory, in Tennessee. Aptly named Titan, the machine boasted a peak performance of more than 27 × 1015 floating-point operations per second, or 27 petaflops. It was an immense computing resource for researchers in government, industry, and academe, and being at the top of the supercomputing heap, it helped to boost pride within the U.S. high-performance computing community.
The satisfaction was short-lived. Just seven months later, Titan lost the world-supercomputing crown to a Chinese machine called Tianhe-2 (Milky Way-2). And three years on, yet another Chinese number-crunching behemoth—the Sunway TaihuLight—took over the title of world’s most powerful supercomputer. Its peak performance was 125 petaflops. After that, Titan wasn’t looking so titanic anymore.
Using the Sunway TaihuLight, Chinese researchers captured the 2016 Gordon Bell Prize [PDF] for their work modeling atmospheric dynamics. “That shows it wasn’t just a stunt machine,” says Jack Dongarra of the University of Tennessee, one of the creators of the Top500 rankings.
You might be wondering why for the past five years the United States has seemingly given up on reclaiming the top spot. In fact, there was no such surrender. In 2014, U.S. engineers drafted proposals for a new generation of supercomputers. The first of these will bear fruit later this year in the form of a supercomputer named Summit, which will replace Titan at Oak Ridge. The new machine’s peak performance will be around 200 petaflops when it comes on line in a few months, which will make it the most powerful supercomputer on the planet.
“We’re very open in the U.S. with our machines,” says Arthur “Buddy” Bland, project director of the Leadership Computing Facility at Oak Ridge. That is, he’s confident that Summit will be completed as planned and that it will be the most powerful supercomputer in the United States. But in the meantime, China, or some other country for that matter, could field a new supercomputer or upgrade an existing one to exceed Summit’s performance. Could that really happen? “We have no idea,” says Bland.
He and his colleagues at Oak Ridge aren’t losing any sleep over the question—and they need all the sleep they can get these days because they still have a lot of work ahead of them as they labor to replace Titan with Summit. They are not, however, following the pattern that they used to build Titan, which was created as a result of a series of increasingly elaborate upgrades to an earlier Oak Ridge supercomputer called Jaguar.
Jaguar was installed in 2005, when computing hardware became obsolete very quickly (as anyone who purchased a personal computer in that era will attest). “We’d do an upgrade every year,” recalls Bland. Jaguar became the most powerful supercomputer in the world in 2009. An even more significant upgrade that began in 2011 allowed Jaguar to be reborn as Titan in 2012.
Why not just upgrade the machine’s internal hardware again instead of building a whole new supercomputer? “We think upgradability is a valid goal,” says Bland—but not one that works in this case, because Titan uses hardware from Cray. “Now we’re going to a machine from IBM: It would not have been possible or reasonable to recycle.” So Titan will keep running for now, but it will be shut down about a year after Oak Ridge’s new supercomputer becomes operational.
One advantage the all-new supercomputer will bring to Oak Ridge is a significant boost in power efficiency. Summit should be able to run researchers’ simulations 5 to 10 times as fast as Titan could, using just twice the power. Typical requirements will be around 15 megawatts. Happily enough, the power will come from the Tennessee Valley Authority’s amply endowed electric grid. Others may find it more challenging to power a modern supercomputer, Bland notes. “Go to your local power company and ask, ‘Where can I plug in my 15-MW computer?’ and see what they tell you,” he quips.
Although Summit will be the most capable, it’s not the only U.S. supercomputer of its class coming on line in 2018. A supercomputer called Sierra, which is expected to exceed 120 petaflops of peak performance, will be completed at Lawrence Livermore National Laboratory, in California. Argonne National Laboratory, too, was slated to begin operating a new supercomputer, one offering 180 petaflops of peak performance, in 2018. But the Illinois lab’s plans for constructing that machine, called Aurora, have been delayed until 2021 in an attempt to expand its capabilities and make it the first U.S. “exascale” (1,000 petaflops, or 1 exaflop) supercomputer.
These huge numbers refer to peak performance, but real-world applications make use of only a fraction of that potential. The often-quoted Linpack benchmark typically runs at 75 percent of a supercomputer’s peak, says Dongarra. “Our dirty little secret is that most real applications are like 3 percent.”
Clearly, figuring out clever ways to boost actual performance matters as much as the number of peak flops theoretically available. And the supercomputer specialists at Oak Ridge are putting plenty of their energies into that effort, too. Joseph Oefelein, who will be using Summit in his studies of the physics and chemistry of combustion at Georgia Tech, puts it succinctly: “There’s more to this game than saying you have the fastest computer.”
This article appears in the January 2018 print magazine as “U.S. Supercomputing Strikes Back.”