illustration: sean mccabe; original photo: may truong
|
Michael McCool of RapidMind
|
Just three weeks before the 2006 Game Developers
Conference in San Jose, IBM had a problem. The company
desperately needed a boffo, unforgettable piece of
computer-generated imagery to demonstrate the power of
the new Cell nine-core microprocessor, which Big Blue
had just developed with Sony and Toshiba. The chip,
produced at a cost of US $400 million, was set to debut
in Sony’s new PlayStation 3 game console in November,
but developers who had been tearing their hair out
trying to program games for the Cell’s new architecture
didn’t yet have any seriously flashy footage to present
at the March show.
So IBM turned to the chicken wrangler.
Actually, he’s a 39-year-old computer science
professor and software entrepreneur named Michael
McCool. In just one weekend, his company, RapidMind, in
Waterloo, Ont., Canada, used the programming platform
that McCool has been working on for nearly a decade to
create a crowd simulation of 16 000 individual chickens.
Imagine the biggest flock of virtual fowl ever
assembled. Each chicken is controlled by a simple
artificial intelligence program, operating according to
a handful of rules. Each chicken wants to move toward
the rooster but must avoid collisions with other
chickens, fences, and the barn. To do so, each one must
constantly check the position of its nearest neighbors
and other objects in its environment and then decide how
to move.
If that doesn’t sound all that impressive to you,
consider this: all 16 000 of those faux chickens are
doing this maneuvering at the same time on a single Cell
microprocessor. It is a chore that would tax a rack full
of conventional servers.
After viewing the virtual barnyard at the IBM booth
during the game conference, one new fan gave the
RapidMind team a rubber chicken. The company’s
developers stashed the gag gift near an air-hockey table
in the office rec room. Now, every time programmers hit
a new performance benchmark, one of them grabs the
chicken and squeezes until it emits an unholy scream.
The masterminds at RapidMind thoroughly abused that
poor bird as they prepared for last month’s release of
the RapidMind Development Platform 2.0, the first
software tool to help programmers write code for
microprocessor chips like the Cell as well as for
graphics processors from ATI Technologies, Nvidia Corp.,
and other companies. What the processors have in common
is that they are all multicore chips—that is, each
individual chip has several or even dozens of processing
units, called “cores.” By the middle of this year,
RapidMind plans to release version 3.0 of the platform,
designed to support multicore CPUs from Intel and
Advanced Micro Devices.
RapidMind’s timing couldn’t be better. While the
Moore’s Law–decreed doubling of transistors goes on
unabated every 18 months, AMD, IBM, Intel, and others
have determined that all those transistors can’t switch
on and off much faster than they already do. Clock
speeds top out at around 4 gigahertz, beyond which a
microprocessor starts getting hot enough to
spontaneously combust. So instead of making smaller
chips that run faster, the near-term strategy is to keep
chips the same size but put more processor cores in them.
What the Experts Say
NICK TREDENNICK: Efforts to extend
standards-based, serial programming languages with
features to describe parallel constructs are
likely to fail. What is more likely to succeed are
languages that raise the level of abstraction in
algorithm description.
The multicore revolution started several years ago
with graphics processing units (GPUs) made by ATI and
Nvidia. Today, graphics chips sport dozens of cores. Now
other kinds of multicore chips are establishing
themselves in the mainstream: the Cell is already
available in the PlayStation 3 and is moving quickly
into servers, televisions, and other applications. And
four-core CPUs from AMD and Intel are scheduled to ship
within the next few weeks.
There’s more to come: Intel unveiled a prototype chip
with 80 cores in September, part of a research project
whose goal is to create a single chip capable of
processing 1 trillion floating-point operations per second.
Of course, there’s a catch. The tantalizing
possibilities of multicore chips—stunningly realistic
and densely populated games, faster scientific
computations, more accurate modeling of seismic,
medical, and financial data—all depend on the ability of
programmers to routinely solve programming challenges
beyond those they face today. Specifically, programmers
are going to have to write programs that are divided
into parts that run in parallel on several processors
simultaneously, a chore that has proven fiendishly
difficult in the past.
“We’re in a period of pain and turbulence for
application designers,” says Carl Claunch, vice
president of research and advisory services at Gartner
Research, in San Jose. “Trying to do more and more in
parallel adds stress, and we don’t have good tools for
it right now.”
Developers are
accustomed to writing programs that execute
functions one after another in serial fashion on one or
maybe two microprocessor cores. Before the debut of the
Cell chip a year ago, parallel programming was largely
confined to niches in high-performance computing and
academic computer science. So until now, programmers
hacking out the multicore version of a game or
three-dimensional simulation have been literally left
to their own devices.
The results aren’t shabby, but they’re far from
optimal. Developers at Insomniac Games, the Burbank,
Calif., publisher of Resistance: Fall of Man for the
PlayStation 3, had to create their own programming tools
and teach themselves how to allocate different
programming tasks to the Cell’s nine different cores
[see “The Insomniacs,” IEEE Spectrum, December 2006].
Their bootstrapping methods took them only so far,
however. For instance, because their software couldn’t
automatically allocate tasks to whichever core was
available, Insomniac programmers had to dedicate two
cores to handle collisions in situations where carnage
and chaos among men, monsters, and machines needed to be
approximated in real time and in living color.
Hina Shah, director of IBM’s Cell Ecosystem and
Solutions Development unit, has heard from customers
about the new challenges the Cell presents and has a
full-time job seeking solutions to ease their pain.
“Today, if a developer is going to program for Cell
directly, they would have to change relevant parts of
their application and manage all aspects of porting it
to Cell,” she says.
“The nice thing about RapidMind is that you don’t need
to change your whole program,” Shah adds. “You can just
pick parts of your application that should be
accelerated, and instead of changing that code to
program all of Cell’s cores by hand, you simply use a
programming interface that handles a lot of the
complications on its own.”
Theoretically, RapidMind’s platform could help
programmers code their entire applications to run on
multiple cores. In practice, users have fed the
RapidMind platform the most computationally intensive
portions of their programs. The platform accelerates
these chunks by breaking them up into smaller pieces and
running them in parallel on several processor cores at once.