Photo: Roland Halbe/Barcelona Supercomputing Center
|
Sacred Servers: MareNostrum, Europe’s third most powerful
computer, resides in a chapel in Barcelona.
|
Here’s how the Kaleidoscope team solves the two-way
wave equation. The first step consists of getting a kind
of rough model of the subsurface layers; this model is
obtained from some initial preprocessing of the echo
data that reveals where the waves travel faster, where
they are refracted, and so on. To get a good image, you
need a good initial model, so Repsol geophysicists spend
weeks and even months crafting it.
Next, the 3DGeo codes use that initial model—a 3-D
grid of numbers, just as in the one-way method—to
propagate the echoes, each step of the wave front
calculated using the wave equation running backward in
time. It may sound esoteric, but all this means is that
time values plugged into the equation have a minus sign.
(The method is also known as reverse time migration.)
The two-way wave equation codes also need to simulate
the propagation of the air gun wave through the grid.
That’s because you generate your image by comparing this
grid of air gun data with the grid of echo data;
wherever the two waves intersect, an echo originated at
that point. These intersections reveal the contours and
interfaces of the surveyed volume.
The Kaleidoscope codes created by 3DGeo consist of
several components, written in C and Fortran, that
basically solve the wave equation for each point in a
spherical wave propagating within the 3-D grid.
Computing each point’s next step in the simulation
requires about 100 floating-point calculations. For a
large seismic survey consisting of 10 000 subsurface
cubes, each a 3-D grid with billions of points, and
requiring tens of thousands of time steps, your
simulation quickly shoots up close to 10 quintillion
(1019) floating-point
calculations. If you tried to run it on your desktop PC,
it would go on for a century before you got an image
like those Bevc was looking at.
Meanwhile, at the Barcelona Supercomputing Center,
other Kaleidoscope researchers are using their expertise
in fluid dynamics and computational mechanics to
fine-tune the 3DGeo codes to run on MareNostrum. The
machine, which comes in at No. 13 in the Top500 ranking
of the world’s fastest computers, has 5120 dual-core
PowerPC processors, 20 TB of central memory, and 400 TB
of disk storage. Built in 2005 by the Spanish
government, MareNostrum resides inside a glass box at
the center of Torre Girona’s nave. (Latin for “our sea,”
Mare
Nostrum was the ancient Romans’ name for the Mediterranean.)
Other big oil companies and seismic-imaging firms
probably have computers as powerful as MareNostrum—or
even more powerful. They guard that kind of information
as carefully as the National Security Agency would. “But
those systems are busy with exploration projects, with
not much time for R&D,” says Michael P. Perrone, a
supercomputing expert at IBM, which collaborates with
the Kaleidoscope efforts. “MareNostrum lets the
Kaleidoscope partners test their big algorithms.”
What makes seismic imaging particularly challenging
for supercomputers is the amount of data involved. The
data for one subsurface cube 10 km on a side can reach
several gigabytes, and a typical survey consists of
thousands of such
cubes. “We’re working with terabytes of data, and this
means that in the supercomputer we must manage the input
and output of data very carefully,” says José María
Cela, a computer engineering professor at the Technical
University of Catalonia and a BSC researcher.
To overcome this problem, the Kaleidoscope researchers
adopted a divide-and-conquer approach. They divided the
cubes into smaller chunks, each going to one of
MareNostrum’s computing nodes. In one test, 3DGeo
divided a 10-km cube into 512 chunks. MareNostrum took
about a minute to process all of them. If the
supercomputer were to process the cube as a whole using
one node, it would require almost 6 hours.
To speed up the codes even more, the BSC experts came
up with several other strategies. They improved the
codes by manually verifying the source code for tasks
that could run in parallel. They minimized the exchange
of data between different tasks and hand-optimized all
the calculation routines. Cela says that these changes
have improved the processing speed of the original
Kaleidoscope code by a factor of five and at the same
time reduced memory usage by a factor of two.
But MareNostrum is just a test bed for the
Kaleidoscope algorithms. The goal is to develop codes
for the next generation of supercomputers. Oil
prospectors replace their computers as fast as you
replace your PC, and maybe even more frequently—about
every two years. “It’s really a race,” says Ortigosa,
Kaleidoscope’s project leader. “Before you finish coding
your algorithm there’s already a new hardware, and you
have to start coding again.”
Kaleidoscope’s goal is to develop the algorithm with
tomorrow's
hardware—the Cell processor—in mind. But programming the
Cell is an entirely new world for most coders. The
processor’s architecture—one main general-purpose
PowerPC core and eight number-crunching units—is so
extraordinary that it requires programmers to rethink
their strategies. That’s why Repsol partnered with BSC,
which has lots of experience with the Cell.
In one initiative, the Spanish researchers are
developing a programming environment dubbed SuperScalar,
which hides the parallelization task from programmers.
It allows them to develop highly parallelized code
without worrying about the data flows among processors.
This past November, BSC and IBM formalized a partnership
to develop a new supercomputer based on the Cell.
Francesc Subirada, associate director of BSC, says that
nobody knows at the moment what this computer will look
like. “But we do have a name for it,” he says. “We call
it MareIncognito.”
The Kaleidoscope Project had its largest software run
late last year. From 3DGeo’s office in California, Bevc
and his team loaded their wave equation codes into
MareNostrum, more than 9000 km away, and turned them
loose on some echo data. Then they waited.
Twenty days later, the supercomputer completed the
task. It was a simulated seismic survey. Instead of
using a real ship to gather real data, 3DGeo re-created
that process in a computer. A virtual ship fired air gun
shots, and virtual hydrophones recorded the echoes. In
contrast with the conditions of a real survey, however,
3DGeo knew the exact geology of the subseabed volume, a
model provided by Repsol. The idea was to apply the wave
equation codes to the simulated echoes and then compare
the resulting image with the known geology to see how
well the codes performed.
The area surveyed was huge: 38 km by 30 km by 15 km,
representing a geological setting much like the Gulf of
Mexico, with complex salt bodies. The simulation
generated 32 TB of data—one of the largest synthetic
data sets in the industry, according to 3DGeo. “I
remember folks at BSC said, ‘You cannot produce that
much data,’ and we said, ‘Yes we can,’ ” Bevc says.
3DGeo considered bringing a copy to its own servers, but
that much data would take two suitcases full of magnetic tapes.
Next 3DGeo used the data to test its codes. It ran
both one-way and two-way wave equation codes. “We know
exactly what the answer should be, so we can see if our
code is right,” Bevc says. The result? “It’s pretty much
dead on,” he says. “We’re able to image things [using
the two-way wave equation] that we weren’t seeing
before, steep salt flanks and such.”
Also last year, the Kaleidoscope Project began its
first production run. It involved real seismic data for
a 500-km2 area in the deep
waters of the Gulf of Mexico. Repsol transported 15 TB
stored in hard drives to Barcelona and loaded it into
MareNostrum. How long did it take to image the area?
Repsol won’t say.
The company is a bit cagey about the details because
it doesn’t want to tip its hand to its competitors.
Ortigosa says they’re still analyzing the results and
that sometime this year, based on those images and other
inputs, the company will decide whether to drill or not.
“This is real, not synthetic data, so this time we don’t
know the answer,” he says. “But I’m confident we’ll get
it right.”