JOBIN RAJ Bonafide Certificate

Certified that this seminar report titled DNA Computing is the bonafide work done by Jobin Raj who carried out the work under my supervision.

in COMPUTER SCIENCE & ENGINEERING

COCHIN 682022

NOVEMBER 2008


I thank my seminar guide Mrs. Greetha S, Lecturer, CUSAT, for her proper guidance, and valuable suggestions. I am indebted to Mr. David Peter, the HOD, Computer Science division & other faculty members for giving me an opportunity to learn and present the seminar. If not for the above mentioned people my seminar would never have been completed successfully. I once again extend my sincere thanks to all of them.

Jobin Raj


DNA (deoxyribonucleic acid) molecules, the material our genes are made of, have the potential to perform calculations many times faster than the world's most powerful human-built computers. DNA might one day be integrated into a computer chip to create a so-called biochip that will push computers even faster. DNA molecules have already been harnessed to perform complex mathematical problems. While still in their infancy, DNA computers will be capable of storing billions of times more data than your personal computer. The technology is still in development, and didn't even exist as a concept a few years ago. DNA computers have the potential to take computing to new levels, picking up where Moore's Law leaves off. The DNA computers are unlikely to feature word processing, emailing and solitaire programs. Instead, their powerful computing power will be used by national governments for cracking secret codes, or by airlines wanting to map more efficient routes. Studying DNA computers may also lead us to a better understanding of a more complex computer -- the human brain.

T'&3* 4 C n$*n$1

C+'"$*) N , Ti$3* P'%* N ,

'ist of *ables 'ist of 6i+ures 'ist of Symbols( !bbreviations and omenclature 7 Introduction 8% ! 8.7 5hat is % !: 8.8 Structure of % ! 8.9 )perations on % ! 8.; % ! as a software 9 Si+nificance of % ! 9.7 % !? ! uni@ue data structure 9.8 )perations in parallel ; % ! vs. Silicon < *he !dleman E.periment 77 = Conclusion > "eferences

v vi vii

7 9 9 ; < = > > A B

7A 7B

Li1$ 4 T'&3*1

S3, N , T'&3*1 P'%* N ,

8.7 )perations on % ! ;.7 Comparison of a % ! computer and Conventional Computer <.7 *S, D City Encodin+

< 7C



Li1$ 4 4i%#)*1

S3, N , I!'%*1 P'%* N ,

8.7 Structure of % ! <.7 *ravellin+ Salesman ,roblem 4 #raph 77 <.8 *S, D City Encodin+ <.9 *S, D "oute Encodin+ <.; #el Electrophoresis <.< !ffinity ,urification

79 79 7< 7=


Li1$ 4 S5!& 31, A&&)*-i'$i n1 'n. N !*n23'$#)*

S3, N , I$*! D*4ini$i n

7 % ! %eo.yribonucleic !cid 8 " ! "ibonucleic !cid 9 m" ! Messen+er " ! ; ! !denine < * *hymine = # #uanine > C Cytosine A $ $racil B "!I% "edundant !rray of Ine.pensive %isks 7C *S, *ravellin+ Salesman ,roblem


% ! Computin+

6, In$) .#2$i n
In 1994, Leonard M. Adleman solved an unremarkable computational problem with a remarkable technique. It was a problem that a person could solve it in a few moments or an average desktop machine could solve in the blink of an eye. It took Adleman, however, seven days to find a solution. Nevertheless, this work was exceptional because he solved the problem with DNA. It was a landmark demonstration of computing on the molecular level. The type of problem that Adleman solved is a famous one. It's formally known as a directed Hamiltonian Path (HP) problem, but is more popularly recognized as a variant of the so-called "travelling salesman problem." In Adleman's version of the travelling salesman problem, or "TSP" for short, a hypothetical salesman tries to find a route through a set of cities so that he visits each city only once. As the number of cities increases, the problem becomes more difficult until its solution is beyond analytical analysis altogether, at which point it requires brute force search methods. TSPs with a large number of cities quickly become computationally expensive, making them impractical to solve on even the latest super-computer. Adleman's demonstration only involves seven cities, making it in some sense a trivial problem that can easily be solved by inspection. Nevertheless, his work is significant for a number of reasons. It illustrates the possibilities of using DNA to solve a class of problems that is difficult or impossible to solve using traditional computing methods. It's an example of computation at a molecular level, potentially a size limit that may never be reached by the semiconductor industry. It demonstrates unique aspects of DNA as a data structure. It demonstrates that computing with DNA can work in a massively parallel fashion. In 2001, scientists at the Weizmann Institute of Science in Israel announced that they had manufactured a computer so small that a single drop of water would hold a trillion of the machines. The devices used DNA and enzymes as their software and hardware and could collectively perform a billion operations a second. Now the same team, led by Ehud Shapiro, has announced a novel model of its biomolecular machine
%ivision of Computer Science( School of En+ineerin+( C$S!* 7

Nevertheless, this work was

exceptional because he solved the problem with DNA. It was a landmark

Nevertheless, his work is significant for a

% ! Computin+

that no longer requires an external energy source and performs 50 times faster than its predecessor did. The Guinness Book of World Records has crowned it the world's smallest biological computing device. Many designs for minuscule computers aimed at harnessing the massive storage capacity of DNA have been proposed over the years. Earlier schemes have relied on a molecule known as ATP, which is a common source of energy for cellular reactions, as a fuel source. But in the new set up, a DNA molecule provides both the initial data and sufficient energy to complete the computation. We propose a new class of algorithms to be implemented on a DNA computer. The algorithms we are going to introduce will not be affected much by the initial condition change. This will give DNA computers great extensibility. Knapsack problems are classical problems solvable by this method. It is unrealistic to solve these problems using conventional electronic computers when the size of them gets large due to the P-complete property of these problems. DNA computers using our method can solve substantially large size problems because of their massive parallelism.

%ivision of Computer Science( School of En+ineerin+( C$S!* 8

% ! Computin+

2, DNA

2.1 What is DNA? DNA (deoxyribonucleic acid) is the primary genetic material in all living organisms - a molecule composed of two complementary strands that are wound around each other in a double helix formation. The strands are connected by base pairs that look like rungs in a ladder. Each base will pair with only one other: adenine (A) pairs with thymine (T), guanine (G) pairs with cytosine (C). The sequence of each single strand can therefore be deduced by the identity of its partner. Genes are sections of DNA that code for a defined biochemical function, usually the production of a protein. The DNA of an organism may contain anywhere from a dozen genes, as in a virus, to tens of thousands of genes in higher organisms like humans. The structure of a protein determines its function. The sequence of bases in a given gene determines the structure of a protein. Thus the genetic code determines what proteins an organism can make and what those proteins can do. It is estimated that only 1-9% of the DNA in our cells codes for genes; the rest may be used as a decoy to absorb mutations that could otherwise damage vital genes. mRNA (Messenger RNA) is used to relay information from a gene to the protein synthesis machinery in cells. mRNA is made by copying the sequence of a gene, with one subtle difference: thymine (T) in DNA is substituted by uracil (U) in mRNA. This allows cells to differentiate mRNA from DNA so that mRNA can be selectively degraded without destroying DNA. The DNA-o-gram generator simplifies this step by taking mRNA out of the equation. The genetic code is the language used by living cells to convert information found in DNA into information needed to make proteins. A protein's structure, and therefore function, is determined by the sequence of amino acid subunits. The amino acid sequence of a protein is determined by the sequence of the gene encoding that protein. The "words" of the genetic code are called codons. Each codon consists of three adjacent bases in an mRNA molecule. Using combinations of A, U, C and G,
%ivision of Computer Science( School of En+ineerin+( C$S!* 9

% ! Computin+

there can be sixty four different three-base codons. There are only twenty amino acids that need to be coded for by these sixty four codons. This excess of codons is known as the redundancy of the genetic code. By allowing more than one codon to specify each amino acid, mutations can occur in the sequence of a gene without affecting the resulting protein. The DNA-o-gram generator uses the genetic code to specify letters of the alphabet instead of coding for proteins. 2.2 Structure of DNA

Fig 2.1 Structure of DNA

%ivision of Computer Science( School of En+ineerin+( C$S!* ;

% ! Computin+

2.3 Operations on DNA Double stranded DNA strands are dissolved in Annealing to single strands (Denaturing) Heating breaks the hydrogen bonds between complementary strands Base-pairing between two complimentary Hybridization single-strand molecules to form a double stranded DNA molecule (Cooling) Joining DNA molecules together Ligase enzymes are used to concatenate free Ligation floating double stranded DNA Often invoked after annealing and hybridization operation DNA can also be replicated, taking a single molecule and multiplying it a thousand fold Possible by Polymerase Chain Reaction (PCR) PCR alternates between two phases: separate Replication (Amplify) DNA into single strands using heat; convert into double strands using primer and polymerase reaction PCR rapidly amplifies a single DNA molecule into billions of molecules Make 2^n copies (n: number of iteration)

copies 1 n? number of iteration 2

Electrophoresis is the movement of charged molecules in an electric field Technique for sorting DNA strands by size Sorting (Gel Electrophoresis) Based on the fact that DNA molecules are negatively charged Rate of migration of molecules in aqueous solution (gel) depends on its shape (size) and electrical charge
%ivision of Computer Science( School of En+ineerin+( C$S!* <

% ! Computin+

Smaller molecules migrate faster through the gel, thus sorting them according to size Gel (made of agarose, polyacrylamide or combination of both) Filtering of DNA containing a specific sequence form a sample of mixed DNA Attach compliment of the sequence to be Filtering (Affinity Purification) filtered to substrate like magnetic bead Beads are mixed with DNA DNA which contains the specific sequence hybridizes with their compliment in the bead Beads are then retrieved and the DNA is isolated Table 2.1 Operations on DNA 2.4 DNA as Software: Think of DNA as software, and enzymes as hardware. Put them together in a test tube. The way in which these molecules undergo chemical reactions with each other allows simple operations to be performed as a by-product of the reactions. The scientists tell the devices what to do by controlling the composition of the DNA software molecules. It's a completely different approach to pushing electrons around a dry circuit in a conventional computer. To the naked eye, the DNA computer looks like clear water solution in a test tube. There is no mechanical device. A trillion bio-molecular devices could fit into a single drop of water. Instead of showing up on a computer screen, results are analyzed using a technique that allows scientists to see the length of the DNA output molecule. "Once the input, software, and hardware molecules are mixed in a solution it operates to completion without intervention," said David Hawksett, the science judge

%ivision of Computer Science( School of En+ineerin+( C$S!* =

% ! Computin+

at Guinness World Records. "If you want to present the output to the naked eye, human manipulation is needed."

8, Si%ni4i2'n2* 4 DNA

3.1 DNA: A unique data structure The amount of information gathered on the molecular biology of DNA over the last 40 years is almost overwhelming in scope. So instead of getting bogged down in biochemical and biological details of DNA, we'll concentrate on only the information relevant to DNA computing. The data density of DNA is impressive. Just like a string of binary data is encoded with ones and zeros, a strand of DNA is encoded with four bases, represented by the letters A, T, C, and G. The bases (also known as nucleotides) are spaced every 0.35 nanometres along the DNA molecule, giving DNA a remarkable data density of nearly 18 Mbits per inch. In two dimensions, if you assume one base per square nanometre, the data density is over one million Gbits per square inch. Compare this to the data density of a typical high performance hard drive, which is about 7 Gbits per square inch -- a factor of over 100,000 smaller. Another important property of DNA is its double stranded nature. The bases A and T, and C and G, can bind together, forming base pairs. Therefore every DNA sequence has a natural complement. For example if sequence S is ATTACGTCG, its complement, S', is TAATGCAGC. Both S and S' will come together (or hybridize) to form double stranded DNA. This makes DNA a unique data structure for computation and can be exploited in many ways. Error correction is one example. Errors in DNA happen due to many factors. Occasionally, DNA enzymes simply make mistakes, cutting where they shouldn't, or inserting a T for a G. DNA can also be damaged by thermal energy and UV energy from the sun. If the error occurs in one of the strands of double stranded DNA, repair enzymes can restore the proper DNA sequence by using the complement strand as a reference. In this sense, double stranded DNA is similar to a RAID 1 array, where data is mirrored on two drives, allowing data to be recovered from the second drive if errors occur on the first. In biological systems, this
%ivision of Computer Science( School of En+ineerin+( C$S!* >

% ! Computin+

facility for error correction means that the error rate can be quite low. For example, in DNA replication, there is one error for every 10^9 copied bases or in other words an error rate of 10^-9. (In comparison, hard drives have read error rates of only 10^-13 for Reed-Solomon correction). 3.2 Operations in parallel In the cell, DNA is modified biochemically by a variety of enzymes, which are tiny protein machines that read and process DNA according to nature's design. There is a wide variety and number of these "operational" proteins, which manipulate DNA on the molecular level. For example, there are enzymes that cut DNA and enzymes that paste it back together. Other enzymes function as copiers and others as repair units. Molecular biology, Biochemistry, and Biotechnology have developed techniques that allow us to perform many of these cellular functions in the test tube. It's this cellular machinery, along with some synthetic chemistry, that makes up the palette of operations available for computation. Just like a CPU has a basic suite of operations like addition, bit-shifting, logical operators (AND, OR, NOT OR), etc. that allow it to perform even the most complex calculations, DNA has cutting, copying, pasting, repairing, and many others. And note that in the test tube; enzymes do not function sequentially, working on one DNA at a time. Rather, many copies of the enzyme can work on many DNA molecules simultaneously. This is the power of DNA computing, that it can work in a massively parallel fashion.

%ivision of Computer Science( School of En+ineerin+( C$S!* A

% ! Computin+

DNA -1, Si3i2 n

DNA vs. Silicon DNA, with its unique data structure and ability to perform many parallel operations, allows you to look at a computational problem from a different point of view. Transistor-based computers typically handle operations in a sequential manner. Of course there are multi-processor computers, and modern CPUs incorporate some parallel processing, but in general, in the basic von Neumann architecture computer, instructions are handled sequentially. A von Neumann machine, which is what all modern CPUs are, basically repeats the same "fetch and execute cycle" over and over again; it fetches an instruction and the appropriate data from main memory, and it executes the instruction. It does these many, many times in a row, really, really fast. The great Richard Feynman, in his Lectures on Computation, summed up von Neumann computers by saying, "the inside of a computer is as dumb as hell, but it goes like mad!" DNA computers, however, are non-von Neuman, stochastic machines that approach computation in a different way from ordinary computers for the purpose of solving a different class of problems. Typically, increasing performance of silicon computing means faster clock cycles (and larger data paths), where the emphasis is on the speed of the CPU and not on the size of the memory. For example, will doubling the clock speed or doubling your RAM give you better performance? For DNA computing, though, the power comes from the memory capacity and parallel processing. If forced to behave sequentially, DNA loses its appeal. For example, let's look at the read and write rate of DNA. In bacteria, DNA can be replicated at a rate of about 500 base pairs a second. Biologically this is quite fast (10 times faster than human cells) and considering the low error rates, an impressive achievement. But this is only 1000 bits/sec, which is a snail's pace when compared to the data throughput of an average hard drive. But look what happens if you allow many copies of the replication enzymes to work on DNA in parallel. First of all, the replication enzymes can start on the second replicated strand of DNA even before they're finished copying the first one. So
%ivision of Computer Science( School of En+ineerin+( C$S!* B

% ! Computin+

% ! is bein+ replicated at a rate of about 7MbitNsecJ after 9C iterations it increases to 7CCC #bitsNsec. *his is beyond the sustained data rates of the fastest hard drives. ow let3s consider how you would solve a nontrivial e.ample of the travellin+ salesman problem 1O of cities P 7C2 with silicon vs. % !. 5ith a von eumann computer( one naive method would be to set up a search tree( measure each complete branch se@uentially( and keep the shortest one. Improvements could be made with better search al+orithms( such as prunin+ the search tree when one of the branches you are measurin+ is already lon+er than the best candidate. ! method you certainly would not use would be to first +enerate all possible paths and then search the entire list. 5hy: 5ell( consider that the entire list of routes for a 8C city problem could theoretically take ;< million #Bytes of memor y 17AM routes with > byte words2M !lso for a 7CC MI,S computer( it would take two years 0ust to +enerate all paths 1assumin+ one instruction cycle to +enerate each city in every path2. &owever( usin+ % ! computin+( this method becomes feasibleM 7CL7< is 0ust a nanomole of material( a relatively small number for biochemistry. !lso( routes no lon+er have to be searched throu+h se@uentially. )perations can be done all in parallel.

DNA C !"#$*)1 C n-*n$i n'3 C !"#$*)1 S$ )'%* M*.i' M*! )5 C'"'2i$5 T5"* 4 O"*)'$i n1 N'$#)* 4 O"*)'$i n1 S"**. 4 *'2+ O"*)'$i n P) 2*11 ucleic acids Semiconductors $ltra4&i+h &i+h Biochemical )perations ,arallel Se@uential Slow 6ast Stochastic %eterministic 'o+ical )perations 1and( or( not2

*able ;.7 Comparison of a % ! computer and a Conventional Computer

%ivision of Computer Science( School of En+ineerin+( C$S!* 7C

% ! Computin+

<, T+* A.3*!'n E@"*)i!*n$

*here is no better way to understand how somethin+ works than by +oin+ throu+h an e.ample step by step. So letGs solve our own directed &amiltonian ,ath problem( usin+ the % ! methods demonstrated by !dleman. *he concepts are the same but the e.ample has been simplified to make it easier to follow and present. Suppose that I live in '!( and need to visit four cities? &ouston( Chica+o( Miami( and Q( with Q bein+ my final destination. *he airline IGm takin+ has a specific set of connectin+ fli+hts that restrict which routes I can take 1i.e. there is a fli+ht from '.!. to Chica+o( but no fli+ht from Miami to Chica+o2. 5hat should my itinerary be if I want to visit each city only once:

6i+ <.7 *ravellin+ Salesman ,roblem 4 #raph It should take you only a moment to see that there is only one route. Startin+ from '.!. you need to fly to Chica+o( %allas( Miami and then to .Q. !ny other choice of cities will force you to miss a destination( visit a city twice( or not make it to .Q. 6or this e.ample you obviously donGt need the help of a computer to find a solution. 6or si.( seven( or even ei+ht cities( the problem is still mana+eable. &owever( as the number of cities increases( the problem @uickly +ets out of hand. !ssumin+ a random distribution of connectin+ routes( the number of itineraries you need to check increases e.ponentially. ,retty soon you will run out of pen and paper listin+ all the possible routes( and it becomes a problem for a computer...

%ivision of Computer Science( School of En+ineerin+( C$S!* 77

% ! Computin+

...or perhaps % !. *he method !dleman used to solve this problem is basically the shot+un approach mentioned previously. &e first +enerated all the possible itineraries and then selected the correct itinerary. *his is the advanta+e of % !. ItGs small and there are combinatorial techni@ues that can @uickly +enerate many different data strin+s. Since the enEymes work on many % ! molecules at once( the selection process is massively parallel. Specifically( the method based on !dlemanGs e.periment would be as follows? 7. #enerate all possible routes. 8. Select itineraries that start with the proper city and end with the final city. 9. Select itineraries with the correct number of cities. ;. Select itineraries that contain each city only once. !ll of the above steps can be accomplished with standard molecular biolo+y techni@ues. P')$ I> G*n*)'$* '33 " 11i&3* ) #$*1 Strate+y? Encode city names in short % ! se@uences. Encode itineraries by connectin+ the city se@uences for which routes % ! can simply be treated as a strin+ of data. 6or e.ample( each city can be represented by a FwordF of si. bases? 'os !n+eles #C*!C# Chica+o C*!#*! %allas *C#*!C Miami C*!C## ew Qork !*#CC#

*able <.7 *S, 4 City Encodin+ *he entire itinerary can be encoded by simply strin+in+ to+ether these % ! se@uences that represent specific cities. 6or e.ample( the route from '.! 44P Chica+o 44P %allas 44P Miami 44P ew Qork would simply be #C*!C# C*!#*! *C#*!C
%ivision of Computer Science( School of En+ineerin+( C$S!* 78

% ! Computin+

C*!C## !*#CC# or e@uivalently it could be represented in double stranded form with its complement se@uence. So how do we +enerate this: SynthesiEin+ short sin+le stranded % ! is now a routine process( so encodin+ the city names is strai+htforward. *he molecules can be made by a machine called a % ! synthesiEer or even custom ordered from a third party. Itineraries can then be produced from the city encodin+s by linkin+ them to+ether in proper order. *o accomplish this you can take advanta+e of the fact that % ! hybridiEes with its complimentary se@uence. 6or e.ample( you can encode the routes between cities by encodin+ the compliment of the second half 1last three letters2 of the departure city and the first half 1first three letters2 of the arrival city. 6or e.ample the route between Miami 1C*!C##2 and Q 1!*#CC#2 can be made by takin+ the second half of the codin+ for Miami 1C##2 and the first half of the codin+ for Q 1!*#2. *his +ives C##!*#. By takin+ the complement of this you +et( #CC*!C( which not only uni@uely represents the route from Miami to Q( but will connect the % ! representin+ Miami and Q by hybridiEin+ itself to the second half of the code representin+ Miami 1...C##2 and the first half of the code representin+ Q 1!*#...2. 6or e.ample?

6i+ <.8 *S, D City Encodin+ "andom itineraries can be made by city encodin+s with the route encodin+s. 6inally( the % ! strands can be connected to+ether by an enEyme called li+ase. 5hat we are left with are strands of % ! representin+ itineraries with a random number of cities and random set of routes. 6or e.ample?

%ivision of Computer Science( School of En+ineerin+( C$S!* 79

% ! Computin+

6i+ <.9 *S, 4 "oute Encodin+ 5e can be confident that we have all possible combinations includin+ the correct one by usin+ an e.cess of % ! encodin+s( say 7CL79 copies of each city and each route between cities. "emember % ! is a hi+hly compact data format( so numbers are on our side. P')$ II> S*3*2$ i$in*)')i*1 $+'$ 1$')$ 'n. *n. =i$+ $+* 2 ))*2$ 2i$i*1 Strate+y? Selectively copy and amplify only the section of the % ! that starts with '! and ends with Q by usin+ the ,olymerase Chain "eaction. !fter ,art I( we now have a test tube full of various len+ths of % ! that encode possible routes between cities. 5hat we want are routes that start with '! and end with Q. *o accomplish this we can use a techni@ue called ,olymerase Chain "eaction 1,C"2( which allows you to produce many copies of a specific se@uence of % !. ,C" is an iterative process that cycle throu+h a series of copyin+ events usin+ an enEyme called polymerase. ,olymerase will copy a section of sin+le stranded % ! startin+ at the position of a primer( a short piece of % ! complimentary to one end of a section of the % ! that you3re interested in. By selectin+ primers that flank the section of % ! you want to amplify( the polymerase preferentially amplifies the % ! between these primers( doublin+ the amount of % ! containin+ this se@uence. !fter many iterations of ,C"( the % ! you3re workin+ on is amplified e.ponentially. So to selectively amplify the itineraries that start and stop with our cities of interest( we use primers that are complimentary to '! and Q. 5hat we end up with after ,C" is a test tube full of double stranded % ! of various len+ths( encodin+ itineraries that start with '! and end with Q.

%ivision of Computer Science( School of En+ineerin+( C$S!* 7;

% ! Computin+

P')$ III> S*3*2$ i$in*)')i*1 $+'$ 2 n$'in $+* 2 ))*2$ n#!&*) 4 2i$i*1, Strate+y? Sort the % ! by len+th and select the % ! whose len+th corresponds to < cities. )ur test tube is now filled with % ! encoded itineraries that start with '! and end with Q( where the number of cities in between '! and Q varies. 5e now want to select those itineraries that are five cities lon+. *o accomplish this we can use a techni@ue called #el Electrophoresis( which is a common procedure used to resolve the siEe of % !. *he basic principle behind #el Electrophoresis is to force % ! throu+h a +el matri. by usin+ an electric field. % ! is a ne+atively char+ed molecule under most conditions( so if placed in an electric field it will be attracted to the positive potential. &owever since the char+e density of % ! is constant 1char+e per len+th2 lon+ pieces of % ! move as fast as short pieces when suspended in a fluid. *his is why you use a +el matri.. *he +el is made up of a polymer that forms a meshwork of linked strands. *he % ! now is forced to thread its way throu+h the tiny spaces between these strands( which slows down the % ! at different rates dependin+ on its len+th. 5hat we typically end up with after runnin+ a +el is a series of % ! bands( with each band correspondin+ to a certain len+th. 5e can then simply cut out the band of interest to isolate % ! of a specific len+th. Since we known that each city is encoded with = base pairs of % !( knowin+ the len+th of the itinerary +ives us the number of cities. In this case we would isolate the % ! that was 9C base pairs lon+ 1< cities times = base pairs2.

6i+ <.; #el Electrophoresis

%ivision of Computer Science( School of En+ineerin+( C$S!* 7<

% ! Computin+

P')$ IV> S*3*2$ i$in*)')i*1 $+'$ +'-* ' 2 !"3*$* 1*$ 4 2i$i*1 Strate+y? Successively filter the % ! molecules by city( one city at a time. Since the % ! we start with contains five cities( we will be left with strands that encode each city once. % ! containin+ a specific se@uence can be purified from a sample of mi.ed % ! by a techni@ue called affinity purification. *his is accomplished by attachin+ the compliment of the se@uence in @uestion to a substrate like a ma+netic bead. *he beads are then mi.ed with the % !. % !( which contains the se@uence you3re after then hybridiEes with the complement se@uence on the beads. *hese beads can then be retrieved and the % ! isolated.

6i+ <.< !ffinity purification So we now affinity purify five times( usin+ a different city complement for each run. 6or e.ample( for the first run we use '.!.34beads 1where the 3 indicates compliment strand2 to fish out % ! se@uences which contain the encodin+ for '.!. 1which should be the entire % ! because of step 92( the ne.t run we use %allas34 beads( and then Chica+o34beads( Miami34beads( and finally Q34beads. *he order isnGt important. If an itinerar y is missin+ a city( then it will not be Ffished outF durin+ one of the runs and will be removed from the candidate pool. 5hat we are left with are the itineraries that start in '!( visit each city once( and end in Q. *his is e.actly what we are lookin+ for. If the answer e.ists we would retrieve it at this step.

%ivision of Computer Science( School of En+ineerin+( C$S!* 7=

% ! Computin+

R*'.in% #$ $+* 'n1=*) )ne possible way to find the result would be to simply se@uence the % ! strands. &owever( since we already have the se@uence of the city encodin+s we can use an alternate method called +raduated ,C". &ere we do a series of ,C" amplifications usin+ the primer correspondin+ to '.!.( with a different primer for each city in succession. By measurin+ the various len+ths of % ! for each ,C" product we can piece to+ether the final se@uence of cities in our itinerary. 6or e.ample( we know that the % ! itinerary starts with '! and is 9C base pairs lon+( so if the ,C" product for the '! and %allas primers was 8; base pairs lon+( you know %allas is the fourth city in the itinerary 18; divided by =2. 6inally( if we were careful in our % ! manipulations the only % ! left in our test tube should be % ! itinerary encodin+ '!( Chica+o( Miami( %allas( and Q. So if the succession of primers used is '! - Chica+o( '! - Miami( '! - %allas( and '! - Q( then we would +et ,C" products with len+ths 78( 7A( 8;( and 9C base pairs. C'-*'$1 !dleman3s e.periment solved a seven city problem( but there are two ma0or shortcomin+s preventin+ a lar+e scalin+ up of his computation. *he comple.ity of the travelin+ salesman problem simply doesnGt disappear when applyin+ a different method of solution 4 it still increases e.ponentially. 6or !dlemanGs method( what scales e.ponentially is not the computin+ time( but rather the amount of % !. $nfortunately this places some hard restrictions on the number of cities that can be solvedJ after the !dleman article was published( more than a few people have pointed out that usin+ his method to solve a 8CC city &, problem would take an amount of % ! that wei+hed more than the earth. !nother factor that places limits on his method is the error rate for each operation. Since these operations are not deterministic but stochastically driven 1we are doin+ chemistry here2( each step contains statistical errors( limitin+ the number of iterations you can do successively before the probability of producin+ an error becomes +reater than producin+ the correct result. 6or e.ample an error rate of 7I is fine for 7C iterations( +ivin+ less than 7CI error( but after 7CC iterations this error +rows to =9I.

%ivision of Computer Science( School of En+ineerin+( C$S!* 7>

% ! Computin+

A, C n23#1i n
*he most si+nificant technolo+y in the future of en+ineerin+ is % ! computers. % ! is what makes up your +enes and stores all the information about you inside your cells. It is the instructions for what you look like and how your function. Each microscopic cell in your body contains the entire % ! needed to build you( which is a lot of information. % ! not only has hu+e data stora+e potential but also the potential to solve complicated calculations and mathematical problems. % ! computers are a very new concept. *he idea was conceived 0ust few years a+o. But in these few years( scientists have already been able to use % ! to solve moderately difficult math problems. % ! computers are still decades away from bein+ able to compete with silicon based computers( but will eventually be much more powerful than silicon based computers. *he first % ! computers will not be like a home ,C. *hey will be used to solve hu+e( complicated mathematical problems( such as breakin+ codes. % ! computers will be thousands of times smaller and more powerful than silicon based computers. )ne pound of % ! has ability to store more data than every electronic devices ever made to date. ! water droplet siEed % ! computers will have more computin+ power than today3s most powerful supercomputers. !nother advanta+e of % ! computin+ over silicon based computers is the ability to do parallel calculations. Silicon based microprocessors can only do on calculation at a time while % ! computer will be able to do many simultaneous calculations.

%ivision of Computer Science( School of En+ineerin+( C$S!* 7A

% ! Computin+

6, R*4*)*n2*1

7. !dleman '. M.( Molecular computation of solutions to combinatorial problems( 7BB;.

8. ,aun #.( "oEenber+ #. and Salomaa !.( Sprin+er( 7BBA

DNA Computing (

%ivision of Computer Science( School of En+ineerin+( C$S!* 7B

