Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 11

E.

coli Expression Systems

Why do we need proteins?

 Proteins are used as therapeutics e.g. growth hormones, insulin, vaccines etc.
 Used as enzymes for various purposes e.g. restriction enzymes, polymerases,
proteases, lipases etc.
 For research (structure-function studies): We need good amount of protein with
high purity to characterize that protein

Desirable features of protein expression

Choosing an appropriate method for expressing a recombinant protein is a critical


factor in obtaining the desired yields and quality of a recombinant protein in a timely
fashion. Selecting a wrong expression host can result in the protein being misfolded or
poorly expressed, lacking the necessary posttranslational modifications or containing
inappropriate modifications. Factors to consider when selecting an expression system
include the mass of the protein and number of disulfide bonds, type of
posttranslational modifications desired on the expressed protein, and the destination of
the expressed protein. The intended application of the purified recombinant protein is
also critical in the decision-making process and the applications can be categorized into
four broad areas: structural studies, in vitro activity assays, antigens for antibody
generation, and in vivo studies.
Therefor desirable features of protein expression are:

 High yield

 Proper folding

 Appropriate post-translational modifications

 Correct destination of the expressed protein

A guide line is important for the investigator in the decision-making process for
choosing an appropriate expression system. However, even with the described guidelines
there are many circumstances when it is not obvious which expression system is the best
choice, and the use of multiple expressions systems must be attempted before an optimal
system is identified. Mainly four expression systems Escherichia coli, Pichia pastoris,
baculovirus/insect cell, and mammalian expression systems are used in laboratories.

Why E. coli?
Among these systems, E. coli remains the most widely used host for recombinant
protein expression. E. coli is easy to transform, grows quickly in simple media, and
requires inexpensive equipment for growth and storage. Also in most cases, E. coli can be
made to produce adequate amounts of protein suitable for the intended application and
there are relatively straightforward methods to scale-up bioproduction.

And Why Not?


No system is without its flaws. Bacteria lack most of the post-translational
modification eukaryotic cells have, and naturally any protein we express will not be
modified. High level overexpression also causes aggregation problems very often and this
can be very difficult to overcome. Reasons for this are multiple, and often most likely a
combination of different factors, such as lack of correct chaperones, high speed of
expression and lack of obligatory interacting partners.

COMPONENTS OF E. COLI EXPRESSION SYSTEMS

1. Expression Vector

Vectors for expression in E. coli contain at a minimum, the following elements:


• A transcriptional promoter.
• A ribosome binding site.
• A translation initiation site.
• A selective marker (e.g., antibiotic resistance).
• An origin of replication

In general, things that affect these can affect the level of protein
expression.

Picture above shows diagram of a typical expression vector with an expression cassette
containing all the elements needed for regulated high level expression of a protein in E.
coli.

Critical elements of the expression vector are discussed below.

i. Promoter

Promoter Strength
A strong promoter may not be best for all situations. Overproduction of RNA may
saturate translation machinery, and maximizing RNA synthesis may not be desirable or
necessary. A weaker promoter may actually give higher steady-state levels of soluble,
intact protein than one that is rapidly induced.

Inducibility
Foreign proteins when over-expressed could be toxic. Inducible promoter keeps the gene
expression off till it is time to turn it on. The promoters can be:

- Drug-inducible (e.g. IPTG [isopropylthioglactoside] or arabinose)


- Heat-inducible
Leaky Promoters
Most promoters will have some background activity. Promoters regulated by the lactose
operator/repressor will drive a small amount of transcription in the absence of added
inducer (e.g., IPTG).To minimize this leakage, 10% glucose can be added to the medium
to repress the lactose induction pathway, the growth temperature can be reduced to 15 to
30°C, and a minimal medium that contains no trace amounts of lactose can be used.
Promoter leakage is only a problem when the expressed protein is highly toxic to the
cells.

Types of Promoters

Promoters used in E. coli expression vectors can be divided into three categories
depending on their origin and mode of function

E. coli natives: lac, trp, tac, trc, ara

Viral, but recognized by E. coli: λL, λR, T5

Viral, but requires its own RNA polymerase: T7, T7lac

Details of these promoters are discussed below.

E. coli Native Promoters

E. coli's own promoters are the first ones ever used to drive overexpression of proteins in
bacteria. These are strong promoters, and can be induced with relatively inexpensive
chemicals, but they are usually superseded by other promoters.

lac:

• Promoter of the lac operon


• Repressed by lacI gene, which binds downstream of the promoter
• Regulated by galatose or its analogues, in expression work non-hydrolysable IPTG
used.

trp:

• Promoter of tryptophane biosynthetic enzymes


• Repressed by Trp, so induction done by causing a Trp deficiency with indole-2-acrylic
acid
tac & trc

Although not naturally found in E. coli the synthetic tac and trc promoters can be
classified as E.coli promoters, as they are created by fusing different elements of the lac
and trp promoters making them more powerful than either of the parental promoters
alone. Tac and trc promoters are:

• Synthetic promoters created by fusion of trp and lac promoters


• -35 part from trp, -10 from lac
• Regulation from lac system, ie. induced by IPTG
• Originally shown to be much stronger than either of the parent promoters

araB Promoter:

Arabinose promoter is perhaps the latest entry to the E. coli promoter family, and offers
very tight control of the expression. Several vectors (e.g. pBAD) with ara promoter are
available from Invitrogen and in particular the thioredoxin fusions are worth having a
look at. One of the advantages of the pBAD vectors is the broad range of inducer (L-
arabinose) concentrations where expression occurs and the ability to fine tune the
expression level to maximise solubility. The araB promoter is the:

• Promoter of the arabinose operon in E.coli


• Very well repressed prior to induction
• Expression induced with arabinose
• Linearly tunable expression level by inducer
• pBAD vectors from Beckwith lab at Harvard
• Commercialised by Invitrogen

T7 promoter system

The system of choice to most of the scientists is the T7 system, which is based on the
powerful promoter of gene 1 of T7 phage and the fast and processive RNA polymerase of
the same phage.

Originally developed by William Studier in the late 80s it has become the most popular
expression system today. Novagen sells the pET system commercially, and they have
tens of different vectors with different fusions etc. The T7 promoter is:

• The promoter of the gene1 of the bacteriophage T7


• Recognized only by the T7 RNA polymerase (T7RP)
• Faster and more processive enzyme
• Commercialized in the pET series of vectors from Novagen - tens of variants
• T7RP can be inhibited by T7 lysozyme (pLysS/E plasmids)
• Usually combined with lacO regulator and lacI gene to provide tight regulation of
expression (T7lac)
• Needs to be combined with a T7 transcription terminator

ii. Fusion Tags

A protein or a peptide located either on the C- or N- terminus of the target protein, which
facilitates one or several of the following characteristics of the protein:

Improved expression/yield
Enhanced solubility
Improved detection and purification
Desired localization

iii. Protease cleavage site

Enterokinase recognizes the sequence (Asp)4Lys and cleaves immediately after the lysine
residue removing the fusion tag:

GAC GAT GAC GAT AAG GAT

The cleavage site and fusion tag can also be synthesized at the C-terminus.

To facilitate assay of the fusion proteins, short antibody recognition sequences can be
incorporated between the tag and the protease cleavage site.

iv. Transcriptional Terminator

Inverted repeats followed by a string of A residues present downstream from the site of
insertion of cloned gene:

5’-GTC AAAA GCCTCCG GT CGGAGGC TTTT GACT


CAG TTTT CGGAGGC CA GCCTCCG AAAA CTGA-5’
2. Expression Host

The choice of an expression host depends on the promoter system to be used.


Promoters that depend on E. coli RNA polymerase can be expressed in most common
cloning strains, while T7 promoter vectors must be used in E. coli that co-express T7
RNA polymerase

Expression in T7 system requires a host strain lysogenized by a DE3 phage fragment,


encoding the T7 RNA polymerase (bacteriophage T7 gene 1), under the control of the
IPTG inducible lacUV5 promoter (Fig). lacI represses the lacUV5promoter and the
T7/lac hybrid promoter encoded by the expression plasmid. A copy of the lacI gene is
present on the E. coli genome and on the plasmid in a number of pET configurations. lacI
is a weakly expressed gene and a 10-fold enhancement of the repression is achieved when
the overexpressing promoter mutant lacI is employed. T7 RNA polymerase is transcribed
when IPTG binds and triggers the release of tetrameric LacI from the lac operator.
Transcription of the target gene from the T7/lac hybrid promoter (repressed by lacI as
well) is subsequently initiated by T7 RNA polymerase

Other characteristics of expression strains:

a. Should be deficient in the most harmful natural proteases


b. Maintain the expression plasmid stably
c. Confer the genetic elements relevant to the expression system

Some Expression Strains

BL21(DE3):

Workhorse of the T7 system


Carries lysogenic λ phage (DE3) which contains a copy of the T7 RNA polymerase under
the control of lacUV5 promoter
BL21(DE3) is the original expression strain developed by William Studier et al. for the
T7 system. It remains as the strain of choice in many cases, and many of the variants
listed in the next slides are are based on this strain. It is relatively wild strain, and grows
fast and as such is well suited for expression work. Some doubt existed over its safety
and ability to colonise human (and other animals) gut, but this seems to have been settled
after a specially commissioned study found it to be similar in its pathogenicity to
commonly used, safe cloning strains like DH5α.

BL21(DE3) derivatives:

Rosetta(DE3): more Arg, Pro, Gly, Leu and Ile tRNAs

BL21(DE3)CodonPlus RP/RIL: similar to one above

BL21(DE3)Star: Reduced RNase activity due to rne131 gene mutation (rne131 gene
encodes RNase enzyme which degrades mRNA

BL21-AI: arabinose inducible T7RP

BL21-SI: Salt inducible variant

Redox modified strains

Many extracellular eukaryotic proteins contain disulfide bonds that stabilise their
structure, but production of such proteins in E.coli can be problematic as the cytoplasm of
the bacteria is reducing and formation of disulfide bonds is unlikely and if formed, their
stability is very low. To overcome this problem, strains with more oxidising intracellular
environment have bene developed, aminly by the group of Jon Beckwith of Harvard
University, and commercialised by Novagen. These strains have deletion of either
thioredoxin B (trxB) alone or trxB and glutathione oxidoreductase (gor), and as a result
of this allow (some) disulfide bond formation in the cytoplasm.

These strains have oxidizing environment in the cytoplasm thus allowing disulfide
formation in the cytoplasm:

BL21(DE3)trxB: Thioredoxin reductase (trxB gene) deletions

Origami(DE3): trxB and glutahione reductase (gor gene) deletions

3. Insert (Gene of interest)


An insert for expression needs to:
- make sense structurally

- be in frame with ATG codon and/or any fusion tag be in frame with the stop codon

- contain suitable ends for cloning

- correct in sequence (!)

- be compatible with the restriction sites you plan to use

As the vectors used for expression work carry all the elements needed for high level
expression, all that is left for you to do is to create an insert that takes advantage of that
design.

Perhaps the most important thing is make sure your insert is cloned in the correct
orientation and in the right translation frame to produce the protein you have decided to
make. If you have no N-terminal fusion, you need to make sure an initiation codon ATG
for methionine is present and in frame with the rest of the protein. You also need to make
sure a sensible stop codon exists to avoid producing excessively long tails. Most
commercial vectors will have stop codons in all three frames, but rather than relying on
this, introduce one yourself in the PCR primer in the ideal position. Of course if you are
using a C-terminal fusion, you have to ensure the reading frame continues in the right
frame and the stop codon will be the one provided by the vector. You will also need to
make sure you can clone the insert to the vector(s) of choice by computing a restriction
map with your insert's sequence.

Once you have done the cloning, made your minipreps and verified that correct insert is
found in the plasmid, you will still need to confirm the correctness of the sequence by
sending a sample to sequencing service. PCR (even with the fanciest error-free
polymerases) will create some errors and it would not be sensible to skip sequencing of
the construct at this stage.

N-terminus of the Insert

The 5' PCR primer needs to contain:

- a restriction enzyme recognition site (plus few extra in the 5' end for effective digestion)
- an ATG codon in frame for initiation methionine
- sequence that anneals with the target gene (in frame)
- optimized codons for E.coli

For example:
C-terminus of the Insert

The 3' PCR primer needs to have

- Restriction enzyme recognition site (plus few extra in the 5' end for effective
digestion)
- STOP codon (TAA/TGA/TGA) in frame with your gene
- Sequence that anneals with the target gene
- remember to reverse complement !

For example:

4. Transformation of the Construct


Transformation is a process in which DNA (e.g. plasmid) is taken into a bacterial or yeast
cell.

A cell that can be transformed is said to be competent

Mostly, cells are treated with CaCl2 to make them competent.

Transformation efficiency (TE)


The number of transformed cells per µg of DNA added to the cells

Factors Effecting Transformation efficiency


Strain type, Growth medium and temperature, growth phase, time of heat shock, type of
DNA (e.g. supercoiled plasmid or ligated plasmid), vector size etc.

Transformation Methods:

Chemical
Involves a brief heat shock to ice-cold cells mixed with the vector DNA.

Electroporation
The cells and DNA are placed between two flat electrodes in a special cuvet and
subjected to a brief (~5 ms) high voltage (1800 v) charge to produce transient pores in the
cell membrane through which the DNA enters the cell.
Important: Prior to electroporation, the cells must be washed and suspended in deionized
water to remove salts that would otherwise cause electrical arcing in the cuvette.
Selection of Transformed Cells (Selection Markers)

Most vectors carry at least one gene that confers antibiotic resistance to the host cells.
e.g., selectable marker gene such as bla, the gene that encodes ß- lactamase, an enzyme
that modifies ampicillin into a form that is non-toxic to the bacterium.
Tetracyclin (tetR) and kanamycin (KanR) resistant markers are also available.

Limitations of E coli Expression System and Strategies to


Overcome

Codon Biasness
Codon usage may also affect the level of protein expression. If the gene of interest
contains codons not commonly used in E. coli, low expression may result due to the
depletion of tRNAs for the rarer codons. When one or more rare codons is encountered,
translational pausing may result, slowing the rate of protein synthesis and exposing the
mRNA to degradation. This potential problem is of particular concern when the sequence
encodes a protein >60kDa, when rare codons are found at high frequency, or when
multiple rare codons are found over a short distance of the coding sequence. However,
the appearance of a rare codon does not necessarily lead to poor expression. It is best to
try expression of the native gene, and then make changes if needed. Strategies
include mutating the gene of interest to use optimal codons for the host organism,
and co-transforming the host with rare tRNA genes.

Rare codons: Arginine AGG, AGA, CGA; Leucine CTA; Isoleucine ATA; Proline CCC.

Disulfide bond formation


There are many things that E. coli does not do well, or at all. If the protein of interest is
naturally multimeric, or requires posttranslational modifications for activity, E. coli as an
expression host may be a poor choice. Disulfide bonds, formed between two cysteines in
an expressed protein, are made inefficiently in the reducing environment of the E. coli
cytoplasm. If the protein is produced, and can be purified from E. coli, in vitro
oxidation of the cysteines may be tried. Alternatively, the gene of interest can be
cloned in a vector that includes a signal sequence (e.g., OmpA, geneIII, and phoA)
that will direct the recombinant protein to the relatively oxidizing environment of
the periplasm of E. coli, where disulfide formation is more efficient. Strains of E.
coli that are deficient in thioredoxin reductase (trxB) permit proper disulfide
formation in the cytoplasm. Subsequent work has produced strains that lack both trxB
and glutathione oxidoreductase and give better rates of disulfide formation than those
seen in native E. coli periplasm.
Inclusion body formation
Inclusion body formation is quite often encountered in E. coli protein expression system.
Improper folding of proteins due to high expression rates results in the formation of
protein aggregates termed as Inclusion Bodies. Inclusion body formation can be
prevented by:

• expressing the protein of interest in line with highly soluble fusion


proteins such as maltose binding protein (MBP), glutathione S-transferase
(GST) or Small Ubiquitin-like Modifier (SUMO) proteins etc.
• lowering the expression temperature and induction time
• lowering the IPTG concentration for induction
• un-folding of the proteins in the inclusion bodies using chaotropic agents
such as urea or guanidine hydrochloride, followed by slow and steady re-
folding through dialysis against physiological buffer

Post-translational Modification
E. coli does not glycosylate or phosphorylate proteins or recognize proteolytic processing
signals from eukaryotes, so take this into account when designing the cloning strategy. If
proteolytic processing is needed, it is best to express only the coding sequences for the
fully processed protein. If the protein of interest requires glycosylation for activity, and
full activity is important in the final use, consider a eukaryotic host.

Protein Functionality
Each requirement placed on a recombinant protein will affect the choice of expression
system. If a protein is to be used only to prepare antibody, it need not be soluble or
active, and the production of inclusion bodies (aggregates of improperly folded protein)
in E. coli may be all that is needed. Alternatively, if a protein’s biological activity will be
assayed, or if it is to be used in structural studies (NMR, crystallography, etc.), a properly
folded and soluble form will be required.

You might also like