A new discipline has bloomed at the intersection of
biology and computer science. Called bioinformatics, it
is already so far advanced that many life scientists
spend more time at their computers than they do at the
laboratory bench. They gobble processing power like
peanuts, burying themselves in the massive comparison of
genes and the chemical instructions the genes issue to
the body's cells.
"Massive" took on new meaning in September, when Paul
G. Allen, cofounder of Microsoft Corp., in Redmond,
Wash., unveiled the greatest bioinformatics initiative
yet: a map called the Allen Brain Atlas, indicating
which of 20 000 genes is doing what in the brain and
where it is doing it. The project is being undertaken by
the Allen Institute for Brain Science, in Seattle, Wash.
Allen put up the US $100 million that the three-year
mission is expected to consume, and he has promised to
put the entire thing on the Web, in quarterly
installments, for individual researchers to access free
of charge. Then the real magic will begin:
neuroscientists will sift the data for insight into the
workings of the mind and clues to the causes, and
possibly the cures, of such devastating ailments as
Parkinson's disease, epilepsy, depression,
obsessive-compulsive disorder, alcoholism, and schizophrenia.
These brain disorders mostly resist today's drug
treatments, and because they generally torture people
for years without killing them, they have created a huge
pool of patients whose care costs tens of billions of
dollars—not to mention indirect economic costs, notably
from lost workdays. New ideas for drug therapy are
desperately required; Allen's Atlas promises to provide them.
Up to now the biggest numbers game in biology had been
run by the publicly financed Human Genome Project, which
sequenced each of the three billion letters in the DNA
code for a human being. "I know a neuroscientist who
downloaded the Human Genome Project onto an Apple iPod,"
says Mark Boguski, an M.D. and Ph.D. who is a veteran of
that project and who now directs the Atlas [see photo,
"Cranial
Cartographer"]. "But that was 3 gigabytes,
and we will be producing petabytes."
The orders-of-magnitude calculation is simple:
multiply 20 000 genes by a trillion neurons. Nobody will
be downloading this mass of data—that's for sure. Drug
companies and other power users that want to get their
arms around the entire data set to apply their own
algorithms will have to pay for special access to the
Atlas computers or to other computers carrying a copy of
all its data.
The project is negotiating with a supplier that it
won't name for 24 servers (or nodes) it wants for its
computer farm, says Brian Crook, a software engineer
who's been with the project for the past six months.
"That's just a start. In my last job, I worked on a
system that had just 15 terabytes in compressed form,
and it required 60 nodes." The Atlas will certainly need hundreds.
It may seem strange that so much more information can
come from a single organ than from a genome that
specifies the entire body, but DNA is really more like a
recipe than a blueprint. It tells you not where every
last component will go but merely what instructions to
follow to get them there. Just as a very few ingredients
can give rise to the delicately detailed swirls of foam
in a soufflé, so can a handful of genes specify the
stupendous complexity of the cerebral cortex—in this
case, the mouse cortex.
Big as the Brain Atlas will be, its managers intend to
start small, by scanning the gray matter of a little
black-furred thoroughbred called the C57BL/6 mouse. It
is a genetically uniform, fast-growing critter that has
the added advantage of having all its genes decoded.
With the transcript of that code—which includes about
as many genes as humans have—the project's directors
hope to get a lot of work done fast and show some
medically valuable results.
Only later will they begin associating structures in
the mouse brain with those in the human one, a
painstaking slog that must await the development of
technology that can safely zero in on the physiology of
individual neurons without first having to kill the subject.
"Paul Allen did not come to us and say, 'Make an atlas
of the mouse brain,' " Boguski says. "But he very
quickly came to realize that the mouse brain was a very
powerful tool." By "us," he means the board of advisers,
an august bunch that includes the linguist Steven
Pinker, professor of psychology at Harvard University in
Cambridge, Mass., and author of The Language
Instinct, and the molecular biologist James
D. Watson, a codiscoverer of the structure of DNA, who
is now president of Cold Spring Harbor Laboratory on
Long Island, N.Y.
Allen was fascinated by the Human Genome Project, and
like many computer people, he had also been interested
in modeling the mind. The brain project fit both
interests and also promised to give a lot of bang for
his philanthropic buck. For several years, such a
project had been on the wish list of the
government-funded National Institutes of Health, in
Bethesda, Md. Only now, however, had the technology
become equal to the task. Methods of speeding and
automating biological research had ripened, the mouse
genome had been sequenced, and the ability to manage
large data sets had matured.
Winner: Allen
Brain Atlas
Goal: In
three years, create a comprehensive map of the mouse
brain showing in which cells each of 20 000 genes are
active, and make it available to researchers free of
charge on the Web. Later, make an equivalent map of the
human brain
Why It's a
Winner: Generating more biological data than
any project that has come before, the Atlas will provide
the most detailed map of the most complex organ. It will
act as a key resource for understanding and then
combating many intractable brain disorders
Organization:
Allen Institute for Brain Science
Center of
Activity: Seattle, Wash.
Number of People on the
Project: 45, going to 100 in two years
Budget: US
$100 million provided by Paul Allen
How can a mere mouse yield medical wonders? There are
plenty of physical diseases for which the mouse has
proved to be a useful model, and there is no reason it
cannot uncover the roots of mental disorders as well.
True, a mouse does not have all that much upstairs. Yet,
though the rodents do not suffer from the same sort of
depression that afflicts people—they do not despise
themselves or pray for death—a few seem to experience
something rather like it. A normal mouse, set afloat in
a tank with a hidden underwater perch, will tread water
until its feet find purchase. "Quasidepressive" mice
give up quickly and sink. However, treated with
antidepressants, those same mice will persevere.
The mouse leg of the Allen Brain Atlas is just getting
started in a long, low building in the Seattle area. A
walk through the place gives me a distinct feeling of
déjà vu: multiple office kitchens filled with
appliances, sunny conference rooms strewn with $1000
Aeron executive chairs. Yep, this place was previously
occupied by a dot-com company that apparently closed up
shop before its employees had time to leave a single
coffee stain on the carpet.
As visitors enter the laboratory proper, they pass a
few big pieces of equipment from Germany, still in their
crates; a lot more are on order. People are on order,
too, as Boguski quietly makes clear to a delegation of
British scientists. Advertisements in the science
journal Nature have elicited
a flood of résumés from experts in animal care,
neuroscience, and computer science—the current hiring
rate runs at about three per week.
Right now, just three people are sitting in the lab,
all huddled along a pair of benches. Another 20 or so
have yet to relocate from temporary quarters in the
downtown Seattle offices of Vulcan Ventures Inc., the
investment firm that manages Allen's many biotech
enterprises. By spring, the project should be staffed
with up to about 45 people; in another couple of years,
it will reach 100, including a number of top scientists
holding joint academic appointments.
The basic goal is to show what the brain genes do and
where they do it. Each gene directs the manufacture of a
particular protein. Famous examples of genes identified
by the proteins they make include the one for hemoglobin
(which carries oxygen in the blood), estrogen (which
feminizes women), and testosterone (which turns men into fools).
In the brain, some of the most interesting proteins
are receptors, so called because they sit on the cell
membrane and receive chemical messengers. For instance,
the dopamine D4 receptor detects
the messenger dopamine as it comes in from neighboring
neurons; this receptor is thought to play a role in
schizophrenia, depression, attention-deficit disorder,
and even the penchant for novel experiences.
Ideally, the Atlas would study the proteins directly,
but because proteins are hard to detect in fine detail
and at high production speeds, the Atlas will instead
follow an easier target: nucleic acids that ferry data
from the nucleus to the structures that translate them
into protein. The technique involves slicing the brain
into many thin sections, putting each one on a slide,
and exposing it to chemicals that attach to the nucleic
acids you're interested in.
Chemical reactions turn the attached acids a color so
that their location in the brain can be scanned and
digitized. The resulting deluge of data must be
cataloged so that various search algorithms can fit it
all into physiologically meaningful patterns.
To get a first pass through 20 000 genes within three
years, the Atlas project will rear mice to a precise
age, then sacrifice them minutes, even seconds, before
cutting their brains into 25-µm slices, about three
cells thick. Speed is of the essence: other organs can
be put on ice, but brains need glucose and oxygen from
second to second or they begin to die. To help freeze
important molecules in place, the workers will inject a
preservative into the mouse while it's still alive and
use the heart to pump the liquid through the brain.
The entire process will be standardized and, to the
extent possible, roboticized to increase productivity. A
slice will be wafted to a slide with a puff of air,
lightly glued to it—rather as a Post-It slip is glued
to a desk—then examined under a microscope and its
image digitized [see photo, "One Down..."].
No brains are being dissected right now, and the
various ways of automating the slicing, dicing,
dissecting, and staining are still being worked out.
It's clear, though, that the factory will have to work
fast. Given a 1.5-cm-long brain and a 25-µ m-thick
slice, you get 600 slices per brain; with the ability to
see just three genes per slice and the goal of looking
at 20 000 genes altogether, you require 7000 brains
(perhaps 8000 for good measure). That comes to four
million slices, most of which will have to be processed
in the last two years of the three-year run, when the
system should be generating some 30 000 slides per week.
Already, Baylor College of Medicine, in Houston,
Texas, has produced some sample slides for the
programmers to play with while they optimize the
software [see screen shot, "Brainscape Navigator"].
By thus tackling the information technology (IT)
challenge first, they will have the project ready when
the flood of homegrown data starts pouring in. A slide's
worth of data may not seem much, coming as it does from
just three genes, but because each gene's protein-making
activity is mapped over two dimensions, the yield comes
to some 50 MB.
That's the kind of specificity that medical
researchers need. "It's like real estate—what matters
is location, location, location," says Boguski. "It
matters not just what area the molecule's in but what
cell it's in."
Rather than carefully dissect a single slice for
hours, the project will shove many identical slices
through its mill fast, taking care to slice at the same
angle every time. This strategy of trading quantity of
data for quality is as foreign to the painstaking world
of neuroanatomy as it is familiar to that of computer
science. Chess software, for example, incorporates
little chess knowledge but applies it to so many
millions of possible lines of play that it can give
headaches even to Garry Kasparov, the world's top
player.
Neuroanatomists, like chess masters, don't like the
idea of an automated factory beating them at their own
game. "One I talked to said that with 25-micron
sections, we often wouldn't even get the nucleus [the
cell's central DNA archive]," says Boguski. "I asked
him, if it were cheap enough to do the same experiment
1000 times, wouldn't that be better than doing it once,
thoroughly?" A thousand slices should get the nucleus
most of the time.
Neuroanatomists don't like the idea of an
automated factory beating them at their own game
Data from each two-dimensional slice will be fed to
programs that reconstruct the brain's structures in
three dimensions. Say you have a neuron whose cell
body—containing the nucleus—sits in the middle of the
brain and whose axon—the long, communicating stalk
analogous to an interconnect in an IC—reaches to the
frontal part of the brain. The software will have to
tease out the long, skinny, possibly oblique path frame
by frame.
Unlike most other big biology projects, the Atlas will
study genes just as they come up, in no particular
order. "That was a mistake at the Human Genome
Project—the scientists stayed with their favorite areas
and the work never got done," says Boguski. "The real
surprises will come when we look at all genes,
agnostically. Scientists are trained to be
hypothesis-driven, but the Atlas will be data-driven."
Boguski, a pathologist by training, worked in
bioinformatics on the Human Genome Project before either
the project or the field bore those names. And he
continued at a bioinformatics company, Rosetta
Inpharmatics LLC, now in Kirkland, Wash. His background
makes him particularly sensitive to another mistake the
Atlas means to avoid: the scanting of bioinformatics.
"The original funders of the Human Genome Project
underestimated the IT element, the National Institutes
of Health have not come to grips with it, and GenSat [a
government mouse-brain anatomy program] paid only for
data production, not for bioinformatics," Boguski says.
That oversight in the Human Genome Project led to a
last-minute scramble in the spring of 2000 to develop a
program that assembled all the various fragments of
genetic code into a mostly coherent whole. "If you can't
use the data, what good is it?" Boguski asks. "We have
twice as many computer science people as biologists
right now, and even when the project reaches full
employment, the ratio will probably still be 1:1."
Lin Chen, a bioinformaticist at the Atlas, is working
in a number of software environments. Strewn around his
workstation are manuals from Red Hat, a software brand
from Red Hat Inc., in Raleigh, N.C., that is based on
Linux, the open-source operating system that is itself
based on Unix. "Unix is big in bioinformatics because
it's good for big-batch processing," Chen says. "Also, a
lot of the software is written in Perl, which is easy,
fast, and loaded with functions to deal with pattern
search and string search. Most things here, I wrote in
Perl," the programming language developed by linguist
Larry Wall to take in large amounts of data and
manipulate it flexibly—a "Swiss Army chainsaw," as its
devotees call it.
Chen used to work for Celera Genomics, part of Applera
Corp., based in Norwalk, Conn. Celera made itself a big
name (though no money) by sequencing the human genome
faster than the Human Genome Project could manage.
Unlike the Human Genome Project, Celera did not
underestimate the IT challenge: it spent $50 million on
what was one of the largest computing centers outside
government weapons laboratories.
The Atlas governing body, the Allen Institute for
Brain Science, won't make a dime from this, either—it
can't, as it is a not-for-profit organization. However,
that isn't stopping it from acting entrepreneurially.
Boguski is looking for corporate and government money to
extend and enhance the project. "We are not just
building an atlas, we're building a platform that can be
used for other experiments," he says.
One next-generation project would be to study mice in
which certain genes have been disabled, so that their
role in the brain can be deduced. Another would be to
construct maps not of the immediate chemical messengers
of the genes, as the scientists are doing now, but of
the final chemical products—the entire set of proteins
in the brain, its so-called proteome.
Next might be the extension of this static image of
the brain to a more dynamic one, part of the
ever-increasing dimensionality: first a line (the string
of DNA code), then a plane (gene activity in a cross
section), next a rendered solid, and finally, perhaps, a
representation of how the solid structure changes over
time. Of course, to keep Paul Allen happy, the
researchers will endeavor to link this picture of mice
to men, in a point-to-point correspondence between the
brains of the two species.
It is not yet clear how this will be done. One way
might be to tag nucleic acids performing protein
synthesis with magnetic molecules, then to scan the
living brain electromagnetically, as in functional
magnetic resonance imaging. The result, assembled by a
computer, would then be a 3-D depiction of the synthesis
of the protein in question.
"We'd start by scanning small animals noninvasively,
then move to humans," Boguski says. Even if only a few
proteins could be outlined in this fashion, the
resulting information could serve as signposts for the
proper alignment of other data that can be
physiologically linked to it.
Like a detective's magnifying glass, the Atlas will
aid sight, not confer it. A lot of inspired
pattern-sifting will be required to unravel the common
psychiatric disorders, which mostly stem not from the
actions of a single misbegotten gene but from those of
many genes, all reacting to environmental cues, one
another, and each one's reactions.
"If the cause of a disease is like a needle in a
haystack, then we're making the haystack smaller,"
Boguski says. "There are pathways to disease, and once
you find them, you're well on your way to finding the
ultimate cause."