3 Sequence Analysis

Analyzing your sequence
The goal of this laboratory exercise is to find differences in the rpoB DNA
sequences from rifampicin resistant E. coli and the parental (wild type) E. coli strain
DH5. As you do this, bear in mind that we have only amplified and sequenced the
middle third of the rpoB gene.
You will find the sequence files in a folder on the desktop of the computers in the
lab. Your TA will instruct you on the name on the folder. The first two parts of the file
names are not distinctive, but the number or alphanumeric in the middle of the file name
will be your label. “F” means Forward direction (sequenced with the Forward primer)
and “R” refers to the Reverse direction (sequenced with the Reverse primer), that is, the
complementary sequence. The sequencing primers were the same primers used for PCR,
F1240 and R2226. Each sequence has two files, with similar names except one ends in
“.seq.” The file with the .seq extension is simply a text file of the sequence. This file
can be opened with a text editor like Simple Text or Word. The second form has the
“.ab1” extension. This file has the chromatogram, the original sequence data with the
peaks as they came from the machine. The .ab1 files are the ones with which we will be
working today.
You should see a series of A, G, C, and Ts and you are also likely to see
“Ns”. “Ns” mean that the sequence for that base was ambiguous and the computer
program does not know which letter to apply.
General Purpose of this lab:
You will use the forward and reverse sequence (which are, of course, complementary)
of your resistant isolate plus a sequence of the wild type RNA polymerase beta subunit.
A program called Sequencher will compare these sequences and allow you to determine:
1. If there is a mutation in the region of the gene you amplified, 2. At which position is
the mutation found 3. If this mutation has been described before 4. If the mutation results
in an amino acid change. 5. If the mutation involves an amino acid that binds directly to
rifampicin.
Assembling your DNA sequence and correcting for sequencing errors:
To start, align the forward and reverse sequences of your mutant to form a contig. The
two sequences are complementary, but the program automatically displays
complementary sequences for DNA sequenced in the reverse direction. Therefore you
should have two identical sequences (do you see why? If not ask your TA or your
instructor). This means that if you find differences between the Forward and the Reverse
strands, it is due to sequencing errors. In the first part of this exercise, you will use
Sequencer to correct for sequencing errors. To do so follow the steps below:
1. On the dock (menu on the bottom of the screen) choose the Sequencher icon,
which has colored peaks. The main Sequencher window opens. Say OK to use
the Demo version. Now drag BOTH (Forward and Reverse) your chromatogram
files (.ab1).
2. For your contig assembly, choose both chromatogram files (both F and R) by
shift-clicking (that is, get them simultaneously highlighted). Then click the
button “Assemble automatically.” Your files will be condensed into a file labeled
“Contig.” Double click on the Contig icon and a window appears. If necessary,
enlarge the window.
3. This overview window shows a diagram of the contig. The red and green lines on
top indicate each sequence (F and R--you can see they run on opposite directions),
the thicker green bar illustrates the extent of the overlapping sequences.
4. Click on the button labeled “Bases” and the aligned sequences appear in the
window. On the top is each individual DNA sequence. Scroll to the right or left
for more sequence. The two sequences are identical because, as stated above, the
program automatically displays complementary sequences for DNA sequenced in
the reverse direction.
5. Look at the two DNA sequences. Notice that the individual sequences do not
overlap at the beginning or the end, and at the two ends of your sequences there
are probably a number of ambiguous “N”s. “N” refers to an unknown base; the
DNA sequencing machine’s software was not able to “call” that base. “Bad”
sequencing is normal at the beginning and the end. Internal “N”s indicate a low
quality sequence, the more internal “N”s the lower the quality. You may also see
the colon symbol “:” that marks where there is a missing base in one of the
aligned sequences. In the contig consensus sequence you may find not only N,
but letters like S, Y, M, or K. These indicate bases that could be any of two or
three bases, because of differences between the aligned individual sequences. For
the purpose of our lab they are not relevant. Consider them as simple dots.
6. Now let’s look at the actual sequence of interest for our study. Below the Forward
and the Reverse sequences you will see the contig consensus sequence.
a. Move the cursor over the consensus contig and click on a base; the letter
should be outlined with a box and highlighted in blue, as will the same
base in the three individual sequences. Now click on the button marked
“Chromatogram” or “Show Chromatogram.” The chromatograms for
each of your sequences should appear, with the same base highlighted that
you highlighted in the contig sequence.
7. Now, let’s correct for sequencing errors. Below the consensus sequence you will
find dots () at each nucleotide where disagreement is found between the
individual sequences. As mentioned above, since the two strands must be
identical, any difference corresponds to a technical error. Click on the dot(s) and
look at the chromatogram. The correct base is the one “called” by the best quality
chromatogram, which has nice, clear, separate peaks. Edit the sequence that has
the “wrong” base.
Use Sequencher to search for a mutation:
BEFORE YOU START YOUR ANALYSIS, MAKE SURE SEQUENCER HAS

THE CORRECT SETTINGS. Pull down the “View” menu. Choose
“Translation”, then “Single stranded”.
8. Now it is time to upload the reference sequence. This is the wild type. You will
compare this sequence to the sequence of your resistant colony in order to identify
your mutation. Your TA will instruct you on where to find it. Simply drag it into
the Sequencer main window. Now simultaneously highlight your contig with the
reference sequence. Then click the button “Assemble automatically.” Click on
the new contig. Now you see all three sequences aligned. Notice that your
reference sequence is much longer than your mutant’s sequences. That is because
we amplified only a fraction of the whole gene.
9. Go to the actual sequences by clicking on “Bases”. Scroll until you see all three
sequences aligned. Click on “Show chromatogram”. NOTE: No chromatogram
will appear for your reference sequence, that sequence is JUST text.
10. Now you have to tell “Sequencher” which sequence you want use as reference for
correct base numbering. The program will number the bases in all three
sequences according to your chosen reference (the first base in the reference will
be base #1). In the left sequence label box, select the icon corresponding to the
rpoB reference sequence. On the upper menu bar select “sequence,” then scroll
down and select “reference sequence.” Now all the bases in your contig are
numbered by the nucleotide position from the start codon of the rpoB open
reading frame. This makes it easy to find any given nucleotide in the sequence.
If a box appears and asks you if you want to reverse, say yes! Before you continue
make sure that base #1 is the A of the ATG (start codon) of the reference
sequence. If not, call your TA.
11. Next, find your mutation. Ignore the ends of the sequence where there is either
no sequence overlap or there are a number of “N”s near each other. Remember,
“Bad” sequencing is normal at the beginning and the end. In the rest of the
contig, look for disagreements between the sequences, that is, where there is a
(). This dot should be a definite difference between your sequence and the wild
type. NOTE: You might not have mutations in your rpoB sequence. Rare
rifampicin resistant mutations occur elsewhere in rpoB.
If you do not have a mutation, use another group data.
Translating the DNA sequence to learn the amino acid sequence.
12. On the consensus sequence, highlight the mutated (ambiguous) nucleotide (if you
have one). Record the nucleotide number corresponding to the mutated nucleotide.
Now pull down the “View” menu. Choose “Reference sequence translation” The
amino acids sequence of the reference sequence will appear below the consensus. The
amino acids are named by the single letter amino acid code. See the table below for
the key to the single letter amino acid code.
A Ala Alanine M Met Methionine

C Cys Cysteine N Asn Asparagine
D Asp Aspartic acid P Pro Proline
E Glu Glutamic acid Q Gln Glutamine
F Phe Phenylalanine R Arg Arginine
G Gly Glycine S Ser Serine
H His Histidine T Thr Threonine
I Ile Isoleucine V Val Valine
K Lys Lysine W Trp Tryptophan
L Leu Leucine Y Tyr Tyrosine
13.Concentrate on a stretch of amino acids (5 or 6) surrounding your mutation. Look

the amino acid sequence published by Campbell et al. and provided in Canvas (PPT
slide entitled Rifampicin binding. Divide the mutation nucleotide number (recorded
previously) by three and look for the amino acid corresponding to this calculated
number. Compare the stretch of amino acids that surround the calculated one with
the amino acid stretch that surrounds your mutation. Find the one corresponding to
the published sequence and voile!
Our Data
1. What is the wild type codon at this position (Hint: USE the DNA genetic code posted
on Canvas)? What is the wild type amino acid at this position (you can use the PPT slide
“Rif Mutations”)? What is the mutant codon? To which amino acid does this codon
correspond (use the genetic code)?
2. Did your mutation resulted in an amino acid with different the chemical properties,
different size, and/or different shape from the wild type one (hint: use the amino acid
table)?
3. Was your mutation one of the previously reported ones? Did your mutation affect an
amino acid directly involved in rifampicin binding (Hint: Look at the PPT slide entitled
Rif mutations. Yellow dots refer to amino acids that bind directly to rifampicin, green
dots are amino acids that do NOT bind to rifampicin)?
General
4. . From what you learned in class and reviewed today in lab how does rifampicin
binding affects RNA polymerase function?
5. In the PPT slide entitled Rif mutations , there is a copy of Figure 1 from the Campbell
et al. paper. Please use it to answer the next two questions. The figure shows the amino
acid clusters where rifampicin-resistance mutations have been identified in E. coli, and
M. tuberculosis. Some of these mutations affect amino acids that actually bind to the
antibiotic. Others affect amino acids not directly involved with the binding. Formulate a
hypothesis to explain how each kind of mutation may confer antibiotic resistance.
6. In Figure 1. You can also find an alignment of the prokaryotic sequences with three
eukaryotic sequences, including human. Gray boxes indicate evolutionary conserved
regions. Are all or some of the rifampicin binding sites conserved? Are other regions
highly conserved? Make a hypothesis of why rifampicin affects pathogens but not their
host.

3 Sequence Analysis

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

3 Sequence Analysis

Uploaded by

Copyright:

Available Formats

Analyzing your sequence

General Purpose of this lab:

Assembling your DNA sequence and correcting for sequencing errors:

Use Sequencher to search for a mutation:

BEFORE YOU START YOUR ANALYSIS, MAKE SURE SEQUENCER HAS

Translating the DNA sequence to learn the amino acid sequence.

A Ala Alanine M Met Methionine

13.Concentrate on a stretch of amino acids (5 or 6) surrounding your mutation. Look

You might also like