Comparing Genomes Summary Week 1 Reading Assignment

From Code to Cell Extract
DNA: where our genes determine our jeans.
The central dogma of Molecular Biology and Genetics is DNA → mRNA → protein, which are
three cellular biochemical substances.
DNA (deoxyribonucleic acid) is located in the nucleus of a cell and contains the genetic
information that determines phenotypic characteristics of an organism, like having blue eyes
or red hair. It is made up of small units known as nucleotides, which contain sugars,
phosphate, and different nitrogenous bases.
DNA is processed into mRNA (messenger RNA) by cellular machinery known as enzymes. This
is called transcription, which allows retention of the original instructions in DNA. mRNA is
transported to the cytoplasm of a cell, which contains ribosomes and more enzymes. The
original instructions retained in mRNA allow the production of amino acids and proteins,
biochemical substances that are vital to cellular function, differentiation, and specialization.
For a person to have blue eyes, their DNA needs to contain a specific variation of a gene. A
gene itself is simply a combination of nucleotide bases making up DNA. The gene responsible
for blue eyes is called OCA2 (Oculocutaneous albinism II). It carries instructions for producing
a protein that regulates the amount of melanin – a pigment responsible for eye, hair, and skin
color. If there are lower levels of melanin in the iris of a person’s eyes, meaning it is more
regulated, blue eyes will be expressed.
This entire process depends on the transcription and translation of DNA in a cell. First, the
DNA containing instructions on eye color must be converted to mRNA by transcription and
then translated into protein which will regulate melanin and control eye color.
Transcription is an example of the input-process-output framework of computer science. It is

the process of converting DNA to mRNA, done mainly by an enzyme called RNA polymerase.
Remember how DNA is made up of nucleotides which contain different nitrogenous bases?
These bases are complementary to each other, which basically means they attach to each
other. For example, the nitrogenous base Adenine (A) forms interactions with the nitrogenous
base Thymine (T). These interactions are used by RNA polymerase. The process is simply like
transcribing a speech. Imagine listening to someone speak and writing down what they say,
like taking notes. You read and then you write.
The enzyme RNA polymerase first reads the instructions in DNA. It does this by attaching to
the DNA, causing its helix to unwind. It moves along the coding strand of DNA to “read” its
exposed nitrogenous bases. Then, RNA polymerase “writes.” It attaches complementary
bases, such as Adenine (A) to Thymine (T) by using free nucleotides present in the nucleus. If
A is present on the DNA strand, then T will be attached in front of it. The temporary
interactions between the complementary bases hold the nucleotides in place while RNA
polymerase catalyzes the sugar-phosphate bond to attach them to each other. As more free
nucleotides are held in position and added to this growing strand, we gain the product mRNA,
ITC / ICA / From code to cell Extract Page |1

which retains the initial genetic instructions encoded in DNA. The process can be seen in the
diagram below, along with a more realistic representation in the animation screenshot.
Figure 1: Diagram of transcription in Figure 2. DNA animation (2002-2014) by Drew

human cells. Berry and Etsuko Uno depicting transcription.1
A common thread
Let’s visualize the instructions in DNA as code in a higher-level programming language. A

strand of DNA, if analyzed, is read by the nitrogenous bases present. For example:
GTGATCCATGGGGAC is a section of analyzed DNA. mRNA is the conversion of this code, of
these DNA instructions, into machine language. In a cell, the machine consists of enzymes,
which process the instructions to produce proteins. These proteins go onto play roles in the
human body and function. So, both disciplines use an information carrier (programming
language or mRNA) for initial instructions (code in higher-level language or DNA) that are then
processed and implemented by computers or cellular machinery (enzymes).
Let’s also consider modularity and debugging in both areas. I can reuse and combine different
functions and modules for complex programs in Python. Similarly, cells can use different
mRNA sequences and splice them to produce protein isoforms or regulate gene expression
and protein production. In such processes, errors are possible. In programming, they can
cause an entire code to crash. Genetic errors, known as mutations, lead to disease by altering
protein synthesis. Thankfully, biologists have developed gene-editing CRISPR technology for
that.
Crossovers.
The interdisciplinary field of computational biology, using computer science approaches and
algorithms for pattern recognition, biological system modelling and data analysis,
demonstrates the successful overlap of the two areas. Initially, it wouldn’t seem so. The
combination of abstract numeric computation and wet, organic, moving living things? A fool’s
bet, maybe. Yet, this pairing has changed both fields. It is possible we might be looking at a
connection as abiding as that of mathematics and physics.

A study in 2010 performed a comparison of E. coli’s gene transcriptional network to Linux
computer operating systems. Using previously mapped interactions of E. coli genes, the
researchers constructed a microbial call graph, a living equivalent to Linux’s open-source
code. Genes were assigned to three categories:
• Master regulator if it switched on one or more genes.
• Middle manager if it was switched on by another gene and switched on others.
• Workhorse if it was switched on but did not switch on others.
Essentially, this categorized genes into those that do essential tasks (workhorses) and those
that direct and control the tasks (master regulators and middle managers).
Linux functions were sorted by the same rules, and resulting categories in both systems were
drawn as a network. The picture emerged as in Figure 3.
Figure 3. The hierarchical layout of the E. coli transcriptional regulatory network and the
Linux call graph.2
Both were shown to have a hierarchical layout but differed such that the E. coli transcriptional
network had a few regulators and many differentiated, specialized functions. In contrast, the
Linux call graph had many regulators controlling a set of generic functions. The overlapping
functions in Linux compared to the highly specialized E. coli molecular roadways indicated
different operational efficiencies. They explained why computers crash and humans don’t. It
is clear in the overlap of the functions in each hierarchy. E. coli’s genes are more organized
into distinct modules, while there is a blurry boundary between Linux’s functions. The
difference between the two is also emphasized in the evolutionary histories of the two
networks.
E.coli’s older genes consist of workhorses and have changed relatively little over time while
Linux’s older functions are more middle managers and master regulators. The overall Linux
network is complex and has been heavily rewritten by programmers. As it has evolved,
programmers haven’t created entirely new workhorse functions but built upon and reused
the existing functions within new modules. This made the Linux network more top-heavy,
with generic workhorse functions overlapping in different modules. When used all at once,
these blurry boundaries cause Linux to be more prone to errors and crashes.

E.coli’s evolutionary pattern has resulted in a more robust and stable network with distinct
modules. This means little change in the workhorse genes, which encode essential proteins,
with a greater degree of evolution in the genes responsible for control and coordination,
which has allowed a flexibility in the ability to respond to environmental pressures and
conditions. It also protects the bacterium from mutations that crop up all the time due to a
high rate of bacterial multiplication.
The evolutionary patterns mostly differ because Linux is the work of human programmers and
E. coli is produced by four billion years of evolution. Nature being the better programmer
works out in humankind’s favor in the end. Our biological design as programmed by nature is
more robust and stable. The design of our products, such as Linux, aims to emphasize reuse
and speed of response over stability and robustness, when compared to nature.
Despite technically different shapes and characteristics, the connection between computer
science and biology is clear. The bond between the two is not solely about manipulating data,
underlying mechanisms, or crunching numbers – it’s about embracing a way of thinking.
Computer scientists tackle problems by breaking them down into smaller, manageable
components. At the same time, biologists try and deconstruct the intricate mechanisms of
life. The shared mindset of problem-solving and pattern recognition unites two disciplines I
never thought to reconcile. Let’s consider the concept of “divide and conquer.” Computer
science involves breaking down large problems into smaller subproblems, like splitting code
into cells, modules, and functions to understand the functionality of each piece. Molecular
biology dissects and compartmentalizes biological phenomena to explore underlying
mechanisms of cellular function at the level of a single protein or enzyme. Isolating a
biomolecular substance triggers an immediately observable reaction in the cell. Both have
processes that can be mapped, just like with Linux and E. coli. This approach allows the two
fields to tackle complex challenges with clarity and precision, from studying molecular
interactions to deciphering genetic pathways to constructing complex systems.
The iterative process is how groundbreaking discoveries are made. It took centuries of work
of many scientists to develop the cutting-edge technology we now possess for gene editing,
from Gregor Mendel (heredity principles) to Rosalind Franklin (DNA X-ray Crystallography) to
James Watson and Francis Crick (the DNA double helix). At the frontier now is CRISPR
technology, which was pioneered by Jennifer Doudna and her team at Berkley in 2012. The
tool is part of a natural bacterial defense mechanism against bacteriophages, consisting of
DNA clustered repeated sequences (CRISPRs), which can remember and destroy viruses by
cutting viral DNA. Essentially, it is an immune system of bacteria against viruses. The basis of
the work exists in Doudna’s 2012 paper in Science which spurred the succeeding competition
between Feng Zhang (Broad Institute), George Church (Harvard) and Doudna’s existing team
on how to implement the CRISPR system in editing human genes.
Notably, Doudna worked with Emmanuel Charpentier in an iterative dance to reach the 2012
discovery. The process started with understanding the CRISPR mechanism. What was already
known were the CRISPR associated (Cas) enzymes, produced by CRISPR adjacent gene

sequences in the bacteria. These enzymes were molecular “scissors,” cutting and pasting DNA
at specific locations and creating short RNA segments (crRNA) to guide them. Doudna’s team
focused on Cas9, trying to make CRISPR-Cas9 cut viral DNA in vivo (in a test tube). The Cas9
enzyme and crRNA were used in the test tube and in theory, the crRNA would guide the
enzyme to cut the viral target. When this didn’t work, Emmanuel Charpentier’s work on
tracrRNA was utilized. The tracrRNA was found necessary for the crRNA to bind to the
enzyme. The iterative process yielded essential information to the functioning of the CRISPR
system. With more experiments and a discovery that Cas9 could be programmed with
different RNAs to cut DNA wherever desired, Doudna’s team found their way to the 2012
invention of single-guide RNA (sgRNA), a combination of the previously worked on tracrRNA
and crRNA, which made the CRISPR-Cas9 more easy-use and reprogrammable as a tool for
humans, a significant feat in humanity’s fight against genetic disease.

Comparing Genomes Summary Week 1 Reading Assignment

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Comparing Genomes Summary Week 1 Reading Assignment

Uploaded by

Copyright:

Available Formats

From Code to Cell Extract

DNA: where our genes determine our jeans.

Transcription is an example of the input-process-output framework of computer science. It is

ITC / ICA / From code to cell Extract Page |1

Figure 1: Diagram of transcription in Figure 2. DNA animation (2002-2014) by Drew

Let’s visualize the instructions in DNA as code in a higher-level programming language. A

ITC / ICA / From code to cell Extract Page |2

ITC / ICA / From code to cell Extract Page |3

ITC / ICA / From code to cell Extract Page |4

ITC / ICA / From code to cell Extract Page |5

You might also like