Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

SPOTLIGHT

DISTRIBUTED PROJECTS TACKLE PROTEIN MYSTERY


By Keri Schreiner

COMPUTING PARADIGM POPULARIZED BY THE SEARCH FOR


LIFE IN OUTER SPACE IS NOW BEING DIRECTED AT UNDER-

STANDING LIFE FROM THE INSIDE. TWO RECENTLY LAUNCHED PROGRAMSONE INDEPENDENT, THE OTHER UNDER THE AUSPICES

of Stanford Universityhave developed


SETI@home-like distributed computing programs that rely on lay participants computing power to help them
unlock the biological mystery of how
proteins fold.

an insoluble clump. When you boil an


egg, for example, the egg whites proteins misfold into a familiar and edi-

ble mass. Scientists believe that protein misfolding is linked to some types
of cancers and to diseases such as cystic fibrosis, mad cow, and Alzheimers,
which is characterized by large deposits of a single, insoluble protein
around the degenerating nerve cells.
The Folderol and Folding@home
projects are now attempting to shed
light on the protein-folding mystery
using the screensaver-based distrib-

The folding challenge

Proteins are composed of long


chains of amino acids in a particular
sequence. DNA in the human genome, which the Human Genome
Project has now mapped, determines
that sequence. However, specifying
amino-acid order alone is basically a
blueprint and provides little insight
into protein activity.
Scientists have long attempted to
understand how and why proteins
function as, for example, antibodies or
enzymes. One of the central questions
addresses protein folding. To carry out
particular functions, unfolded proteins, which have millions of potential
folded states to choose from, typically
find a single correct state within seconds or minutes. The process of nding this specic shape, called a fold, is
essentially a form of rapid self-assembly that lets proteins do their work.
When proteins misfold, they form

JANUARY/FEBRUARY 2001

The Folderol screensaver. The text along the left side of the screen notes the users
name and karma points (top), along with overall project information including the
number of trials run, the average of all iteration scores, and the variance among
scores.

13

SPOTLIGHT

uted computing model popularized by


the SETI@home project. Although
each project takes a different approach
to the folding problem, both rely on
ever-increasing PC performance and
the willingness of lay users to participate in solving this biological mystery.
Folderol

For Scott Legrand, the Folderol project (www.folderol.org) marks his


roundabout return to a long-time obsession. After 10 years of protein-folding research, Legrand decided he
needed a break and left academia to pursue a hobby that was evolving into a fullblown career: game design. Legrand,
Stephanie Wukovitz, and Douglas Engel designed and launched the Battlesphere game, which quickly developed
a cult following. Thereafter, the team
turned their collaborative energies to
Legrands former professional obsession, and Folderol was born.
Id put a lot of the ideas that I had
when I left academia on hold, let them
simmer for a while, says Legrand,
adding that when he was ready to return to the problem, the computing
stage was ideally set. When I left academia, the world was infested with Pentium 133s. When I got around to Folderol, the world was infested with
Pentium III 500s, which is about the
same level of performance as the highend work stations I had seven years ago.
So now I feel like I have thousands and
thousands of high end workstations at
my disposal.
Folderol is currently putting that
performance power to work searching
for low-energy protein conformations.
A conformation is a unique 3D amino
acid conguration that forms a protein
solely by adjusting angles between individual amino acids without breaking
or creating new bonds. The basic idea
behind the effort is that because a pro-

14

tein has only a short time to fold, its


correct conformation must be both
low in energy and easy to reach from
an unfolded state. Folderol thus attempts to predict the protein structure
by locating low-energy conformations
of a test protein, Target 114. It then
identifies the conformation with the
largest number of highly similar
neighbors. Folderol participants run a
conformational search consisting of
10,000 independent parallel simula-

Folderol currently has


more than 3,000 users
who participate in
the project using a
distributed software
program based on
HTTP protocols.

tions of the target protein. Once a


search is complete, participants return
their lowest energy conformation to
the Folderol Web server. In return,
they receive a Karma point. However, the hope of any one user finding
the right conformation itself is unlikely to be rewarded.
The model that rates the energy of
a single conformation of a protein is
heinously inaccurate. So basically the
chances of a single person getting the
right answer are extremely low, says
Legrand, adding that the hope for a so-

lution resides in the realm of probabilities. The object is to look at all the
common features present in the independent searches. If there are features that tend to emerge everywhere,
theyre probably features of the correct
structure, simply by a probabilistic argument. Its conformational entropy.
Not only will the correct answer be of
low energy, but it will also be a very
easy structure to generate through a
conformational search.
Legrand says that the scientic community generally accepts the notion
that a native structure must be a very
probable result. However, that a computerized conformational search will
obey those rules is pure speculation
on his part. But I cant think of a reason why it wouldnt.
Folderol currently has more than
3,000 users who participate in the project using a home brew distributed
software program based on HTTP protocols. Folderol runs in a screen saver
and is currently ported to Windows, although volunteer developers are working on a cross-platform port through
SourceForge (www.sourceforge.net), a
support service for open source developers and projects. The open source
ideal forms an important part of the
Folderol effort. Legrand says that one
of his motives for starting the project
was to put the enormous code base he
generated in graduate and post-graduate study of the problem to good use.
My thinking was that I could just
give it out, and that people could play
with it and start investing in this problem on their own and that maybe
thered be some innovation from the
outside, says Legrand. There are
maybe 40 groups in the entire world
that are really doing this kind of research. Thats a really small crowd for
such a big problem.
As to security, Legrand says that they

COMPUTING IN SCIENCE & ENGINEERING

initially planned to let participants upload dynamically linkable libraries for


on-the-fly updates. They decided
against it when they realized it would
be possible for someone to spoof (forge)
their servers IP address and launch a
rogue executable. All Folderol downloads are thus manual. Legrand doubts
the likelihood of other security issues
posing a problem.
For someone to spoof our server,
theyd have to generate a low-energy
conformation of a protein to begin
with. And what would happen in the
unlikely event that someone did this?
Wed just take their data, he says.
We validate the data when we receive
it to see that its low energy. If it doesnt
satisfy that, we just throw it out. We
did have one person submit a whole
bunch of subtly altered conformationsabout 500 timestrying to up
his karma. But we looked at the energy
and saw that it was just nonsense, and
bam, we threw it out.
They also threw the participant out.
The Folderol Web site keeps a running tally of more honorable participants and their accumulated Karma.
Once participants complete the run of
protein-folding simulations, the Folderol team will compare the simulation results against experimental results. If the results match, theyll
launch a new run using a different
protein. If they dont find a match,
theyll use the data to improve their
model and attempt another set of runs
on the target protein.
Folding@home

Stanford Universitys Folding@home


project (http://www.stanford.edu/group/
pandegroup/Cosm) is the brainchild of
Vijay Pande, head of the Department of
Chemistrys Pande Group. Pande says
that the distributed computing approach
forms a necessary step, both toward the

JANUARY/FEBRUARY 2001

The Folding@home screen saver features four visualization modes. In this spacefilling mode, filled spheres represent the approximate volume that the electrons
occupy around each atom. Carbon atoms appear in dark gray, hydrogen atoms in
light gray, oxygen atoms in red, and nitrogen atoms in blue The blue bar (bottom
left) denotes the completed fraction of the run, which moves one unit to the right
each time the red bar crosses the right of the screen. The red bar typically takes one
to two minutes to cross, so even brief runs of the screen saver yield useful results.

groups specic protein-research goals,


and the more general aims of computational biology.
Right now there is a huge gap between what wed like to do in computational biology and what current supercomputers can do for us, says Pande.
Within our own group, we have hundreds of PCs that we use in clusters, but
even that doesnt get us as far as wed
like. We need tens of thousands, hundreds of thousands to answer a lot of
the kinds of questions we want to answer. Funding such an approacha
hundred thousand machineswould
cost a hundred million dollars. Theres
no way were ever going to get that
kind of funding.
Similar to other distributed computing projects such as Folderol and
SETI@home, the project runs on a
screensaver. Pandes team issued the
rst nonbeta release of the program in
early October. It currently has more
than 5,000 participants who communi-

cate with the server and each other using networking and platform-independent routines developed by Adam Beberg using his Mithral CS-SDK tools.
Bebergs tools, which have just been
commercially released, facilitate and
coordinate processing between server
and client computers. According to
Mithrals Adam Pavlacka, the tool kit,
which is available free to qualifying academic programs, is like a fill-in-theblank template that developers can easily implement into their own programs.
Folding@homes protein dynamics
code is a modied version of Tinker, a
molecular dynamics program developed at the Washington University
School of Medicine.
Pande says that the groups initial approach aims to study proteins structural
properties to understand the self-assembling dynamics. Their approach has
focused on timing. Proteins fold as
quickly as a millionth of a second. Although this is very fast on a human time

15

Member Societies
American Physical Society
Optical Society of America
Acoustical Society of America
The Society of Rheology
American Association of Physics Teachers
American Crystallographic Association
American Astronomical Society
American Association of Physicists in Medicine
American Vacuum Society
American Geophysical Union
Other Member Organizations
Sigma Pi Sigma, Physics Honor Society
Society of Physics Students
Corporate Associates
The American Institute of Physics is a not-for-profit
membership corporation chartered in New York
State in 1931 for the purpose of promoting the advancement and diffusion of the knowledge of physics
and its application to human welfare. Leading societies in the fields of physics, astronomy, and related
sciences are its members.
The Institute publishes its own scientific journals as
well as those of its Member Societies; provides abstracting and indexing services; provides online database services; disseminates reliable information on
physics to the public; collects and analyzes statistics
on the profession and on physics education; encourages and assists in the documentation and study of
the history and philosophy of physics; cooperates
with other organizations on educational projects at all
levels; and collects and analyzes information on Federal programs and budgets.
The scientists represented by the Institute through
its Member Societies number approximately
120,000. In addition, approximately 5,400 students
in over 600 colleges and universities are members
of the Institutes Society of Physics Students, which
includes the honor society Sigma Pi Sigma. Industry is represented through 47 Corporate Associates
members.
Governing Board*
John A. Armstrong (Chair), Anthony A. Atchley, Martin
Blume, Marc H. Brodsky (ex officio), James L. Burch,
Brian Clark, Lawrence A. Crum, Michael D. Duncan,
Judy R. Franz, Jerome I. Friedman, Donald R.
Hamann, Christopher G. A. Harrison, Judy C.
Holoviak, Ruth Howes, Frank L. Huband, John L.
Hubisz, Ivan P. Kaminow, Bernard V. Khoury, Larry D.
Kirkpatrick, John A. Knauss, Leonard V. Kuhi, Arlo U.
Landolt, James S. Langer, Louis J. Lanzerotti, Charlotte Lowe-Ma, Rudolf Ludeke, Christopher H. Marshall, Thomas J. McIlrath, Arthur B. Metzner, Robert
W. Milkey, Richard C. Powell, S. Narasinga Rao,
Charles E. Schmid, James B. Smathers, Benjamin B.
Snavely (ex officio), A.F. Spilhaus, Jr., John A.
Thorner, George H. Trilling, N. Rey Whetton, Jerry M.
Woodall
*Executive Committee members are printed in italics.
Management Committee
Marc H. Brodsky, Executive Director and CEO;
Richard Baccante, Treasurer and CFO; Theresa C.
Braun, Director, Human Resources; James H.
Stith, Director, Physics Resources Center; Darlene
A. Walters, Vice President, Publishing; Benjamin B.
Snavely, Corporate Secretary
Subscriber Services
AIP subscriptions, renewals, address changes, and
single-copy orders should be addressed to Circulation and Fulfillment Division, American Institute of
Physics, 1NO1, 2 Huntington Quadrangle, Melville,
NY 11747-4502. Tel. (800) 344-6902; e-mail subs@
aip.org. Allow at least six weeks advance notice. For
address changes please send both old and new addresses, and, if possible, include an address label
from the mailing wrapper of a recent issue.

scale, a thousand-fold gap exists between the simulations nanosecond time


scale and the microseconds of the fastest
protein folds. Pande and his groups solution was to break the microsecond
barrier using distributed dynamics.
Their algorithm lets them divide the
work between multiple processors, using a near-linear speed-up in the number of processors.
To illustrate, Pande uses an analogy:
Say that a woman is attempting to pass
over a mountain range from one valley
to another using a particular path. Now
assume that she is blindfolded and that
all her attempts to find the route are
random. Given these parameters, it
would likely take a long time for her to
nd the route.
It takes something like a uke to actually get over the mountaintop. Most of
the time is spent just wandering around
the valley, says Pande. Thats why proteins take so long to fold. Lets say a given
protein folds in 10 microseconds. Its not
like if you looked at it after 10 percent of
the time it would be 10 percent folded
and after 20 percent of the time it would
be 20 percent folded. It doesnt work like
that at all. For most of the time, it
wouldnt be folded at all, then in that last
bit of time it would go over.
Pande says that using a brute-force
simulation would thus entail a lot of
time spent sampling the unfolded state.
Instead, Folding@home takes advantage of what is otherwise a burden: Instead of blindfolding one person, they
use distributed computing to put say, a
thousand blindfolded people in the valley. They thus increase the probability
of having one of the blind wanderers
randomly traverse the barrier. The successful wanderer then communicates
this feat to the other wanders, who follow the target path over the pass into
the next valley. They then begin the
task again to search for the next path

over the next range until they achieve


the protein fold.
Thats what lets us get these trajectories that go over the barriers much more
quicklyactually a thousand times more
quickly, says Pande. Our algorithm allows us to use all the computers to generate simulations that tell us about time
scales in the microsecond range. Were
using all the computers as a whole to get
something fundamentally new.
Future unfoldings

In addition to public efforts such as


Folderol and Folding@home, Entropia, a San Diego-based commercial distributed computing operation,
has announced plans to launch a similar protein-folding project. How
these efforts will play out in terms of
actually closing in on the folding mystery is itself a mystery for now, but
there is little doubt that such efforts
will continue as both public ventures
and private obsessions until the problem is solved.
Protein folding is one of those holy
grail kind of quests, says Legrand.
Its been sitting around for 40 years
and it looks like its really, really simple, but so far, no matter what people
do, they dont find the correct answer.
It evades everyone. So it hooks you,
you get addicted to it. There are a lot
of people out there who are addicted
to it and will just keep going at it until its solved.

Keri Schreiner is a freelance writer based in


Southern California. Contact her at keri@
grooveline.com.

COMPUTING IN SCIENCE & ENGINEERING

You might also like