Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Folding@Home: An Analysis

Panuelos, Kevin Matthew Uy, Kyrstyn Kaizzle Cagampan, Bernadyn Reyes

Abstract
Folding@Home is a massive distributed computing project that enables researchers to create complex protein folding simulations in a matter of seconds due to the way the distributed computing system utilizes concepts like virtualized resources, in the form of the programs participants, to do the job. Categories and Subject Descriptors Systems ]: Client/Server C.2.4 [Distributed

computers that people around the globe can join in on so that researchers can exploit an cumulative amount of computing power to nd ways to make human life better and more healthy.

2.

Folding@Home

Keywords Folding@Home, Distributed Systems, Virtualization, Filesystems, Identication

Folding@home is a software developed to simulate the folding of proteins, RNA, and nanoscale synthetic polymers. Using the paradigm of distributed computing, Folding@home is used by scientists to study and know more about diseases like Alzheimers, Huntingtons, and many cancers [9]. The goal of Folding@home is to be able to produce a concise model, quantitatively and qualitatively, of the protein folding kinetics [5]. Folding@home makes use of other softwares to make processes possible. The GPU core uses the OpenMM MD package. Its protein dynamics part is actually a modied version of TINKER which is a powerful molecular dynamics program written by Jay Ponders laboratory. Gromacs molecular simulation package were also incorporated in the project. In fact, the group behind Folding@home is actually working hand in hand with the Gromacs developers. Lastly, the network library (Mithral CS-SDK) developed by Cosm was used for the client and server code [9]. The scientic group behind Folding@home is the Pande lab which is a laboratory under the department of Chemistry and of Structural Biology in Stanford University and Stanford University Medical Center. According to the website of Folding@home, in certain areas of the study, the eorts of the the people in Pande lab is not enough; with that, they decided to collaborate with other laboratories. In running and improving Folding@home, Folding@home Consortium (FAHC) is comprised of the following:
Huang Lab, HKUST Izaguirre Lab, Notre Dame Kasson Lab, University of Virginia Lindahl Lab, Stockholm University Shirts lab, University of Virginia Snow lab, Colorado State University Sorin lab, CSULB Voelz lab, Temple University Zagrovic lab, Mediterranean Institute for Life Sciences

1.

Introduction

A distributed system consists of several computers performing their own operations; however, it is viewed by the user as a single system with all operations performed in it. [10] Distributed systems appear to the users as a single coherent system; examples of such are the world wide web, branch ofce computer networks, and distributed manufacturing systems [6]. According to [6], a distributed system has 5 main goals: transparency, openness, reliability, performance, and scalability. These goals are emphasized so as to create a system that appears as a single system, is easy to modify, rebuild, or change, and will grow and adjust over time; but the primary goal is to achieve higher performance - one that is better and more reliable than that of a single system. Distributed systems allow resource-sharing, thus decreasing process turnaround times, and the availability of the systems. The systems allow computers to communicate via a network or shared memory; this makes computations faster; but some mechanism must ensure that the data transfers are reliable [1]. A distributed system can be virtualized so that the addition and reduction of resources can be invisible, and the scale aorded by the sheer concept of distributed systems in areas such as research is astounding; computations that couldnt be done by a handful of consumer grade computers in a server setup can be done by the combined power of hundreds across the globe, allowing researchers the possibility to further focus on testing out solutions and hypotheses that were limited by technical limitations beforehand. One system that applies the concepts of distributed systems and allows such breaking down of technical barriers for scientists is Folding@home, which is a widespread system of

3.

Signicance of Folding@Home

For years, researchers have been trying to solve problems in biology through technological simulations of some scientic and complex processes, but in some areas, like in complex biology, there are problems that are yet to be addressed due to the speed of processing it requires that a single modern computer could not handle. [5] incorporated the use of the computational paradigm: distributed computing. Particularly, they incorporated in their work the use of Folding@home as a solution to the problem in processing. In the complex study of protein structure, understanding it and the processes involved in it, like protein folding, is dicult. Scientists researches are not only tied on to the nal product but are interested on how it got there as well. In this case, proteins that end up misfolded must be analyzed in such a way that the reason for the misfolding can be known, thus necessitating that the structures before the protein was misfolded should be simulated; this way, the whole protein folding process can be observed from beginning to end. Though the process happens very fast in the human body in real time, it takes time before computers can simulate it. The solution oered by Folding@home is to divide the work over 100,000 multiprocessors to make the speedy simulation of protein folding possible [9].

Figure 4.1. Infrastructure of Folding@Home

4.
4.1

The Innards of Folding@Home


Infrastructure

[7] depicts the abstract process that goes on behind the scenes when researchers decide to use the system to analyze protein folding, as redrawn in gure 4.1. The researcher, who may or may not know much about the infrastructure, initiates the server and queries the database to get a piece of information. The database is actually built from the server, which is essentially tasked with delegating analysis work to several computers hosted around the globe. 4.2 Cosm

Figure 4.2. COSMs Software Stack language, time & calendar, transformation, synchronization, networking, compression and other functionalities. 4.3 Client-Server Architecture

Folding@Home relies upon The Cosm Project, which provides an abstraction to various hardware implementations by viewing them as a set of standard protocols [3]. In this set up, it is the server where Cosm is installed that interacts with other computers, or clients, that implement the protocols, as dened in the accompanying Client-Server Software Development Kit, or CS-SDK. [3] presents the projects software stack in the perspective of the server hosting Cosm on top, to a client computer at gure 4.2. The distributed computing platform uses the CS-SDK to directly access the lower layers. In the software stack, Cosm FS is the lesystem, Cosm ID is for name management, identication and authentication, Cosm Comm is for networking, and Cosm Job is the process manager and scheduler. The Cosm Utility Layer implements encryption,

Cosms Distributed Computing Platform relies upon on a client-server architecture where the server employs various techniques to delegate a computer to do a task, utilizing several, if not all, features of Cosm to interact with these computers. Each computer implementing Cosms protocols act as a client that must answer to the server. Folding@Home relies upon this architecture in order to facilitate exchanges between a participant and the database. When the Stanford server delegates a job to a participant, essentially requesting the participant computer to do a job, the participant responds with the results of the job. With this in mind, the Stanford server acts more like a conventional client, and the participants are servers. 4.4 Virtualization

Cosm acts as a singular computer, essentially virtualizing a computer consisting of a lesystem, networking, a process

manager and a central processing unit to accomplish a computationally intensive job. The distributed computing platform perceives common user operations, such as saving les or retrieving les, to be an operation to be sent to a singular computer. From the perspective of Folding@Home, the Stanford server can utilize virtualized resources, which are provided by the participants. Putting these virtualized resources into consideration, the job scheduler will then monitor whether a participant may go down and adjust the jobs assigned accordingly. It is possible that the scheduler sends out the same job to a dierent computer when it senses that the job has not returned a result because the host computer has gone down. A participant that installs the Folding@Home software in their computers or game consoles essentially installs an implementation of the Cosm CPU/OS layer, which covers up dierences between operating systems and hardware architectures (e.g., x86, 64-bit, ARM). This layer is then accessed by the simulation job sent to the participants computer to utilize unused processing power. 4.5 Filesystem

tem has already helped make strides in making incredibly detailed models of proteins in Alzheimers patients, possibly making a design for a drug imminent. A drug for Huntingtons disease has even been proposed directly due to results from a simulation made using Folding@Home; even more milestones have been posted and maintained in [4] information page.

References
[1] Baker, M. An overview of distributed computing, 1997. [2] Beberg, A. Chapter 1 - introduction. [3] Communications, M., and Inc., D. The cosm project. [4] Folding@Home. Folding@home diseases studied faq, 2012. [5] Larson, S. M., Snow, C. D., Shirts, M., P, V. S., and Pande, V. S. Folding@home and genome@home: Using distributed computing to tackle previously intractable problems in computational biology. [6] Lee, I. Introduction to distributed systems, 2007. [7] Pande, V. Folding@home, 2004. [8] Pande, V. Crossing the petaop barrier, 9 2007. [9] Pande, V. Folding@home distributed computing, 2012. [10] Steem, M. Tanenbaum, A. Distributed systems, 2010.

[2] describes the Cosm lesystem as a redundant one; copies of a le are made and spread onto several places, and les are divided into chunks of 64KB. This makes retrieving les in reality slow due to the spread and redundant nature of the lesystem, but fault tolerant. Files are periodically veried to see if the hosts that contain said le are still up and running, otherwise, a copy of a les chunk is stored in another host. Data is removed two months after it is veried to avoid so-called ghost les. In other words, if a le is updated, an outdated version of the le may be present in another host. Deleting this veried chunk of data automatically will help avoid the consequences of said phenomena. Still, judging from Folding@Homes infrastructure, it does not seem that Folding@Home utilizes the abstracted lesystem, instead opting for the process to insert job results to a database. Each participant in Folding@Home does not actually keep the data of the folding simulation job, in this sense.

5.

Conclusion

Folding@Home, as a distributed computing system, utilizes the concepts of resource virtualization as its primary distinction, and does this to enable the fast computation and formulation of protein folding simulations. A client-server architecture is employed to schedule jobs among hosts, or participants, without concern for various implementations of the hardware being emulated. The fact that Folding@Home is considered to be the biggest distributed system built for disease research, with access to one petaop of computing power [8], and is a testament to how well-suited distributed systems can become for research, among other applications. [7] cites that the sys-

You might also like