Download as pdf or txt
Download as pdf or txt
You are on page 1of 50

Directed Evolution Methods and

Protocols Methods in Molecular


Biology 2461 Andrew Currin (Editor)
Visit to download the full and correct content document:
https://ebookmeta.com/product/directed-evolution-methods-and-protocols-methods-in
-molecular-biology-2461-andrew-currin-editor/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Mouse Cell Culture Methods and Protocols Methods in


Molecular Biology 633 Andrew Ward

https://ebookmeta.com/product/mouse-cell-culture-methods-and-
protocols-methods-in-molecular-biology-633-andrew-ward/

Whole-Body Regeneration: Methods and Protocols (Methods


in Molecular Biology, 2450) Blanchoud

https://ebookmeta.com/product/whole-body-regeneration-methods-
and-protocols-methods-in-molecular-biology-2450-blanchoud/

Circadian Regulation Methods and Protocols Methods in


Molecular Biology 2482 Guiomar Solanas

https://ebookmeta.com/product/circadian-regulation-methods-and-
protocols-methods-in-molecular-biology-2482-guiomar-solanas/

Rhodopsin Methods and Protocols Methods in Molecular


Biology 2501 Valentin Gordeliy (Editor)

https://ebookmeta.com/product/rhodopsin-methods-and-protocols-
methods-in-molecular-biology-2501-valentin-gordeliy-editor/
Ferroptosis Methods and Protocols Methods in Molecular
Biology 2712 Guido Kroemer (Editor)

https://ebookmeta.com/product/ferroptosis-methods-and-protocols-
methods-in-molecular-biology-2712-guido-kroemer-editor/

DNAzymes Methods and Protocols Methods in Molecular


Biology 2439 Gerhard Steger (Editor)

https://ebookmeta.com/product/dnazymes-methods-and-protocols-
methods-in-molecular-biology-2439-gerhard-steger-editor/

Cancer Cell Biology Methods and Protocols Methods in


Molecular Biology 2508 Sherri L. Christian (Editor)

https://ebookmeta.com/product/cancer-cell-biology-methods-and-
protocols-methods-in-molecular-biology-2508-sherri-l-christian-
editor/

Proteomics in Systems Biology Methods and Protocols


Methods in Molecular Biology 2456 Jennifer Geddes-
Mcalister (Ed.)

https://ebookmeta.com/product/proteomics-in-systems-biology-
methods-and-protocols-methods-in-molecular-biology-2456-jennifer-
geddes-mcalister-ed/

Monoamine Oxidase Methods and Protocols Methods in


Molecular Biology 2558 Claudia Binda (Editor)

https://ebookmeta.com/product/monoamine-oxidase-methods-and-
protocols-methods-in-molecular-biology-2558-claudia-binda-editor/
Methods in
Molecular Biology 2461

Andrew Currin
Neil Swainston Editors

Directed
Evolution
Methods and Protocols
METHODS IN MOLECULAR BIOLOGY

Series Editor
John M. Walker
School of Life and Medical Sciences
University of Hertfordshire
Hatfield, Hertfordshire, UK

For further volumes:


http://www.springer.com/series/7651
For over 35 years, biological scientists have come to rely on the research protocols and
methodologies in the critically acclaimed Methods in Molecular Biology series. The series was
the first to introduce the step-by-step protocols approach that has become the standard in all
biomedical protocol publishing. Each protocol is provided in readily-reproducible step-by-
step fashion, opening with an introductory overview, a list of the materials and reagents
needed to complete the experiment, and followed by a detailed procedure that is supported
with a helpful notes section offering tips and tricks of the trade as well as troubleshooting
advice. These hallmark features were introduced by series editor Dr. John Walker and
constitute the key ingredient in each and every volume of the Methods in Molecular Biology
series. Tested and trusted, comprehensive and reliable, all protocols from the series are
indexed in PubMed.
Directed Evolution

Methods and Protocols

Edited by

Andrew Currin and Neil Swainston


Manchester Institute of Biotechnology, University of Manchester, Manchester, UK
Editors
Andrew Currin Neil Swainston
Manchester Institute of Biotechnology Manchester Institute of Biotechnology
University of Manchester University of Manchester
Manchester, UK Manchester, UK

ISSN 1064-3745 ISSN 1940-6029 (electronic)


Methods in Molecular Biology
ISBN 978-1-0716-2151-6 ISBN 978-1-0716-2152-3 (eBook)
https://doi.org/10.1007/978-1-0716-2152-3

© Springer Science+Business Media, LLC, part of Springer Nature 2022


This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and
retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter
developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply,
even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations
and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to
be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty,
expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been
made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Humana imprint is published by the registered company Springer Science+Business Media, LLC part of Springer
Nature.
The registered company address is: 1 New York Plaza, New York, NY 10004, U.S.A.
Preface

Directed evolution (DE) is a powerful approach for the engineering of biological molecules,
often employed to introduce novel desirable characteristics. Successful DE projects require
the integration of a number of different key approaches, including computational design,
DNA mutagenesis and cloning, biochemistry, and screening. With such an array of skills
required, detailed methodologies and protocols provide an invaluable insight into the latest
techniques. We are therefore delighted to share the new chapters in this edition and hope
that these provide a useful resource to incorporate into new experimental workflows. We
would like to thank all the contributing authors for their efforts in writing these chapters,
particularly as this has been completed during the COVID-19 pandemic.
In recent years advances in technology has permitted DE to be employed at a larger scale
at high (<106 samples per day) and ultra-high (>106 samples per day) throughput. These
approaches rely greatly on in silico capabilities to design, process, and analyze these experi-
ments, often generating large datasets. Such experiments are enhanced through the use of a
learn process, to understand and predict the relationship between genotype and phenotype.
This edition of Methods in Molecular Biology attempts to equip an experimenter with the
latest techniques in DE, covering aspects at each stage of the Design-Build-Test-Learn cycle.

Manchester, UK Andrew Currin


Manchester, UK Neil Swainston

v
Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Contributors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
1 Designing Overlap Extension PCR Primers for Protein Mutagenesis:
A Programmatic Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Xiaofang Huang, Liangting Xu, Chuyun Bi, Lili Zhao, Limei Zhang,
Xuanyang Chen, Shiqian Qi, and Shiqiang Lin
2 Recombination of Single Beneficial Substitutions Obtained from Protein
Engineering by Computer-Assisted Recombination (CompassR) . . . . . . . . . . . . . 9
Haiyang Cui, Mehdi D. Davari, and Ulrich Schwaneberg
3 Nondegenerate Saturation Mutagenesis: Library Construction and Analysis
via MAX and ProxiMAX Randomization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Anupama Chembath, Ben P. G. Wagstaffe, Mohammed Ashraf,
Marta M. Ferreira Amaral, Laura Frigotto, and Anna V. Hine
4 Antha-Guided Automation of Darwin Assembly for the Construction
of Bespoke Gene Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
P. Handal-Marquez, M. Koch, D. Kestemont, S. Arangundy-Franklin,
and V. B. Pinheiro
5 SpeedyGenesXL: an Automated, High-Throughput Platform
for the Preparation of Bespoke Ultralarge Variant Libraries
for Directed Evolution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Joanna C. Sadler, Neil Swainston, Mark S. Dunstan, Andrew Currin,
and Douglas B. Kell
6 Facile Assembly of Combinatorial Mutagenesis Libraries Using
Nicking Mutagenesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Monica B. Kirby and Timothy A. Whitehead
7 GeneORator: An Efficient Method for the Systematic Mutagenesis
of Entire Genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Lucy Green, Nigel S. Scrutton, and Andrew Currin
8 Rapid Cloning of Random Mutagenesis Libraries
Using PTO-QuickStep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Pawel Jajesniak, Kang Lan Tee, and Tuck Seng Wong
9 Construction of Strong Promoters by Assembling Sigma Factor
Binding Motifs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Yonglin Zhang, Yang Wang, Jianghua Li, Chao Wang,
Guocheng Du, and Zhen Kang
10 Application of Restriction Free (RF) Cloning
in Circular Permutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Boudhayan Bandyopadhyay and Yoav Peleg
11 Site-Directed Mutagenesis Method Mediated by Cas9. . . . . . . . . . . . . . . . . . . . . . . 165
Wanping Chen, Wenwen She, Aitao Li, Chao Zhai, and Lixin Ma

vii
viii Contents

12 Directed Evolution of Transcription Factor-Based Biosensors


for Altered Effector Specificity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
Leopoldo Ferreira Marques Machado and Neil Dixon
13 A Screening Method for P450 BM3 Mutant Libraries
Using Multiplexed Capillary Electrophoresis for Detection
of Enzymatically Converted Compounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
Anna G€ a rtner, Gustavo de Almeida Santos, Anna Joëlle Ruff,
and Ulrich Schwaneberg
14 Directed Evolution of Glycosyltransferases by a Single-Cell
Ultrahigh-Throughput FACS-Based Screening Method . . . . . . . . . . . . . . . . . . . . . 211
Yumeng Tan, Xue Zhang, Yan Feng, and Guang-Yu Yang
15 Learning Strategies in Protein Directed Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . 225
Xavier F. Cadet, Jean Christophe Gelly, Aster van Noord,
Frédéric Cadet, and Carlos G. Acevedo-Rocha

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
Contributors

CARLOS G. ACEVEDO-ROCHA • Biosyntia ApS, Copenhagen, Denmark


MARTA M. FERREIRA AMARAL • College of Health and Life Sciences, Aston University, Aston
Triangle, Birmingham, UK, Bicycle Therapeutics, Cambridge, UK
S. ARANGUNDY-FRANKLIN • Medical Research Council Laboratory of Molecular Biology,
Cambridge, UK; Sangamo Therapeutics Inc., CA, USA
MOHAMMED ASHRAF • College of Health and Life Sciences, Aston University, Aston Triangle,
Birmingham, UK
BOUDHAYAN BANDYOPADHYAY • Department of Biotechnology, School of Life Science and
Biotechnology, Adamas University, Kolkata, West Bengal, India
CHUYUN BI • Key Laboratory of Crop Biotechnology, Fujian Agriculture and Forestry
University, Fujian Province Universities, Fuzhou, China; College of Life Science, Fujian
Agriculture and Forestry University, Fuzhou, China
FRÉDÉRIC CADET • Laboratoire d’Excellence GR-Ex, Paris, France; BIGR, DSIMB,
UMR_S1134, INSERM, University of Paris & University of Reunion, Paris, France
XAVIER F. CADET • PEACCEL, Artificial Intelligence Department, Paris, France
ANUPAMA CHEMBATH • College of Health and Life Sciences, Aston University, Aston Triangle,
Birmingham, UK
WANPING CHEN • State Key Laboratory of Biocatalysis and Enzyme Engineering, Hubei
Collaborative Innovation Center for Green Transformation of Bio-resources, School of Life
Sciences, Hubei University, Wuhan, People’s Republic of China
XUANYANG CHEN • Key Laboratory of Crop Biotechnology, Fujian Agriculture and Forestry
University, Fujian Province Universities, Fuzhou, China; College of Life Science, Fujian
Agriculture and Forestry University, Fuzhou, China
HAIYANG CUI • Institute of Biotechnology, RWTH Aachen University, Aachen, Germany
ANDREW CURRIN • Manchester Institute of Biotechnology, University of Manchester,
Manchester, UK
MEHDI D. DAVARI • Institute of Biotechnology, RWTH Aachen University, Aachen, Germany
GUSTAVO DE ALMEIDA SANTOS • Institute of Biotechnology, RWTH Aachen University,
Aachen, Germany
NEIL DIXON • Manchester Institute of Biotechnology (MIB), The University of Manchester,
Manchester, UK; Department of Chemistry, The University of Manchester, Manchester, UK;
SYNBIOCHEM, The University of Manchester, Manchester, UK
GUOCHENG DU • The Key Laboratory of Carbohydrate Chemistry and Biotechnology,
Ministry of Education, Jiangnan University, Wuxi, China; The Key Laboratory of
Industrial Biotechnology, Ministry of Education, School of Biotechnology, Wuxi, China
MARK S. DUNSTAN • School of Chemistry, The University of Manchester, Manchester, UK; The
Manchester Institute of Biotechnology, The University of Manchester, Manchester, UK;
Centre for the Synthetic Biology of Fine and Specialty Chemicals (SYNBIOCHEM), The
University of Manchester, Manchester, UK
YAN FENG • State Key Laboratory of Microbial Metabolism, School of Life Sciences and
Biotechnology, Shanghai Jiao Tong University, Shanghai, China
LAURA FRIGOTTO • Isogenica Ltd., Essex, UK; Evonetix Ltd, Essex, UK
ANNA GA€ RTNER • Institute of Biotechnology, RWTH Aachen University, Aachen, Germany

ix
x Contributors

JEAN CHRISTOPHE GELLY • Laboratoire d’Excellence GR-Ex, Paris, France; BIGR, DSIMB,
UMR_S1134, INSERM, University of Paris & University of Reunion, Paris, France
LUCY GREEN • Manchester Synthetic Biology Research Centre for Fine and Speciality
Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology, School of Chemistry,
Faculty of Science and Engineering, University of Manchester, Manchester, UK
P. HANDAL-MARQUEZ • Rega Institute, KU Leuven, Leuven, Belgium
ANNA V. HINE • College of Health and Life Sciences, Aston University, Aston Triangle,
Birmingham, UK
XIAOFANG HUANG • Key Laboratory of Crop Biotechnology, Fujian Agriculture and Forestry
University, Fujian Province Universities, Fuzhou, China; College of Life Science, Fujian
Agriculture and Forestry University, Fuzhou, China
PAWEL JAJESNIAK • Department of Chemical and Biological Engineering, ChELSI Institute
and Advanced Biomanufacturing Centre, University of Sheffield, England, UK
ZHEN KANG • The Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of
Education, Jiangnan University, Wuxi, China; The Key Laboratory of Industrial
Biotechnology, Ministry of Education, School of Biotechnology, Wuxi, China; The Science
Center for Future Foods, Jiangnan University, Wuxi, China
DOUGLAS B. KELL • Department of Biochemistry and Systems Biology, Institute of Systems,
Molecular and Integrative Biology, University of Liverpool, Liverpool, UK; The Novo
Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kgs
Lyngby, Denmark
D. KESTEMONT • Rega Institute, KU Leuven, Leuven, Belgium
MONICA B. KIRBY • Department of Chemical and Biological Engineering, University of
Colorado, Boulder, CO, USA
M. KOCH • Synthace Ltd., London, UK
AITAO LI • State Key Laboratory of Biocatalysis and Enzyme Engineering, Hubei
Collaborative Innovation Center for Green Transformation of Bio-resources, School of Life
Sciences, Hubei University, Wuhan, People’s Republic of China
JIANGHUA LI • The Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of
Education, Jiangnan University, Wuxi, China; The Key Laboratory of Industrial
Biotechnology, Ministry of Education, School of Biotechnology, Wuxi, China
SHIQIANG LIN • Key Laboratory of Crop Biotechnology, Fujian Agriculture and Forestry
University, Fujian Province Universities, Fuzhou, China; College of Life Science, Fujian
Agriculture and Forestry University, Fuzhou, China
LIXIN MA • State Key Laboratory of Biocatalysis and Enzyme Engineering, Hubei
Collaborative Innovation Center for Green Transformation of Bio-resources, School of Life
Sciences, Hubei University, Wuhan, People’s Republic of China
LEOPOLDO FERREIRA MARQUES MACHADO • Manchester Institute of Biotechnology (MIB), The
University of Manchester, Manchester, UK; Department of Chemistry, The University of
Manchester, Manchester, UK
YOAV PELEG • Structural Proteomics Unit (SPU), Life Sciences Core Facilities (LSCF),
Weizmann Institute of Science, Rehovot, Israel
V. B. PINHEIRO • Rega Institute, KU Leuven, Leuven, Belgium; Institute of Structural and
Molecular Biology, University College London, London, UK
SHIQIAN QI • Department of Urology, State Key Laboratory of Biotherapy, West China
Hospital, College of Life Sciences, Sichuan University, Chengdu, China
ANNA JOËLLE RUFF • Institute of Biotechnology, RWTH Aachen University, Aachen,
Germany
Contributors xi

JOANNA C. SADLER • School of Chemistry, The University of Manchester, Manchester, UK;


Institute of Quantitative Biology, Biochemistry and Biotechnology, School of Biological
Sciences, University of Edinburgh, Edinburgh, UK; The Manchester Institute of
Biotechnology, The University of Manchester, Manchester, UK; Centre for the Synthetic
Biology of Fine and Specialty Chemicals (SYNBIOCHEM), The University of Manchester,
Manchester, UK; Mellizyme Biotechnology Ltd, Liverpool Science Park, Liverpool, UK
ULRICH SCHWANEBERG • Institute of Biotechnology, RWTH Aachen University, Aachen,
Germany; DWI Leibniz-Institute for Interactive Materials, Aachen, Germany
NIGEL S. SCRUTTON • Manchester Synthetic Biology Research Centre for Fine and Speciality
Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology, School of Chemistry,
Faculty of Science and Engineering, University of Manchester, Manchester, UK
WENWEN SHE • State Key Laboratory of Biocatalysis and Enzyme Engineering, Hubei
Collaborative Innovation Center for Green Transformation of Bio-resources, School of Life
Sciences, Hubei University, Wuhan, People’s Republic of China
NEIL SWAINSTON • Manchester Institute of Biotechnology, University of Manchester,
Manchester, UK
YUMENG TAN • State Key Laboratory of Microbial Metabolism, School of Life Sciences and
Biotechnology, Shanghai Jiao Tong University, Shanghai, China
KANG LAN TEE • Department of Chemical and Biological Engineering, ChELSI Institute
and Advanced Biomanufacturing Centre, University of Sheffield, England, UK
ASTER VAN NOORD • Biosyntia ApS, Copenhagen, Denmark
BEN P. G. WAGSTAFFE • BattLab Ltd., Coventry, UK
CHAO WANG • The Science Center for Future Foods, Jiangnan University, Wuxi, China
YANG WANG • The Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of
Education, Jiangnan University, Wuxi, China
TIMOTHY A. WHITEHEAD • Department of Chemical and Biological Engineering, University
of Colorado, Boulder, CO, USA
TUCK SENG WONG • Department of Chemical and Biological Engineering, ChELSI Institute
and Advanced Biomanufacturing Centre, University of Sheffield, England, UK
LIANGTING XU • Key Laboratory of Crop Biotechnology, Fujian Agriculture and Forestry
University, Fujian Province Universities, Fuzhou, China; College of Life Science, Fujian
Agriculture and Forestry University, Fuzhou, China
GUANG-YU YANG • State Key Laboratory of Microbial Metabolism, School of Life Sciences and
Biotechnology, Shanghai Jiao Tong University, Shanghai, China
CHAO ZHAI • State Key Laboratory of Biocatalysis and Enzyme Engineering, Hubei
Collaborative Innovation Center for Green Transformation of Bio-resources, School of Life
Sciences, Hubei University, Wuhan, People’s Republic of China
LIMEI ZHANG • Key Laboratory of Crop Biotechnology, Fujian Agriculture and Forestry
University, Fujian Province Universities, Fuzhou, China; College of Life Science, Fujian
Agriculture and Forestry University, Fuzhou, China
XUE ZHANG • State Key Laboratory of Microbial Metabolism, School of Life Sciences and
Biotechnology, Shanghai Jiao Tong University, Shanghai, China
YONGLIN ZHANG • The Key Laboratory of Carbohydrate Chemistry and Biotechnology,
Ministry of Education, Jiangnan University, Wuxi, China
LILI ZHAO • Key Laboratory of Crop Biotechnology, Fujian Agriculture and Forestry
University, Fujian Province Universities, Fuzhou, China; College of Life Science, Fujian
Agriculture and Forestry University, Fuzhou, China
Chapter 1

Designing Overlap Extension PCR Primers for Protein


Mutagenesis: A Programmatic Approach
Xiaofang Huang, Liangting Xu, Chuyun Bi, Lili Zhao, Limei Zhang,
Xuanyang Chen, Shiqian Qi, and Shiqiang Lin

Abstract
Overlap extension PCR is one of the routinely used methods to generate mutagenic genes for the functional
and structural study of proteins. However, it is time-consuming to design the overlapping mutagenic
primers and gene primers by manual operation. In this chapter, we present a Python script that is able to
search all the possible primer combinations according to the preset definitions and calculate the necessary
parameters of each primer for the users, which could facilitate the primer design process. Up to 256 pairs of
primers can be provided for selection using this script.

Key words Site-directed mutagenesis, Overlap extension PCR, Overlapping mutagenic primers,
Primer design

1 Introduction

Site-directed mutagenesis is frequently performed in the functional


study of genes and proteins both in vitro and in vivo, and to
corroborate the observation in biomacromolecules. Generally,
whole plasmid PCR and the overlap extension PCR are the two
most popular methods to prepare mutagenic constructs for pro-
ducing protein mutants [1, 2]. However, both approaches are not
perfect and require high-quality primers.
In whole plasmid PCR, mutagenic primers are used to generate
the mutagenic DNA products, an approach which takes wild-type
plasmids as templates. After the PCR, the enzyme DpnI is added to
the sample to digest the wild type template plasmids. Subsequently,
the mixture is transformed into competent E. coli cells to grow
colonies, followed by colony PCR identification and DNA sequenc-
ing. This method is frequently applied due to its simplicity and
efficiency. However, during the PCR, mutations might be gener-
ated within the plasmid backbone accidentally, which can affect the

Andrew Currin and Neil Swainston (eds.), Directed Evolution: Methods and Protocols, Methods in Molecular Biology, vol. 2461,
https://doi.org/10.1007/978-1-0716-2152-3_1, © Springer Science+Business Media, LLC, part of Springer Nature 2022

1
2 Xiaofang Huang et al.

Fig. 1 The procedure of overlap extension PCR for producing mutagenic genes. The sense strand of the full-
length gene is shown with a long thick blue arrow. The antisense strand of the full-length gene is shown with a
long thick brown arrow. The full-length gene is used as a template to amplify the mutagenic upstream
fragment with the forward gene primer (thin blue arrow) and the reverse overlapping mutagenic primer (thin
brown arrow with three green triangles representing the mutagenic codon), and the mutagenic downstream
fragment with the forward overlapping mutagenic primer (thin blue arrow with three purple triangles
representing the mutagenic codon) and the reverse gene primer (thin brown arrow), respectively. In the
third PCR reaction, the mutagenic upstream fragment and the mutagenic downstream fragment, which can
bridge when annealing, are used as the template to generate the full-length mutagenic gene with gene
primers

subsequent experiments and are frequently not detected unless


whole plasmid sequencing is performed. Additionally, in the case
of large plasmids and especially long genes, DNA polymerase may
not work correctly and efficiently.
The method of overlap extension PCR needs two pairs of
primers. One pair is the overlapping mutagenic primers, and the
other is for amplification of the whole gene. Usually, the procedure
involves three reactions: the first two generate the upstream frag-
ment and the downstream fragment, while the third one produces
the full-length mutagenic sequence (Fig. 1). Although the work-
flow is relatively inconvenient owing to three times of gel extrac-
tion, this approach is advantageous since it does not bring any
mutations to the vector backbone and is capable of handling large
vectors and long genes efficiently. Additionally, the result of each
PCR can be checked in agarose gel directly, allowing for a high rate
of success in introducing the target mutations.
The design of overlapping mutagenic primers is of importance
for a successful overlap extension PCR, which is time-consuming
and fallible work. Currently, there are several tools for automatic
primer design, including commercial software [3], free standalone
software [4], and free online tools [5, 6]. However, to date there
Programmatic Designing of Primers for Protein Mutagenesis 3

has been no specialized software for designing overlapping muta-


genic primers. In this chapter, we provide a Python script for
searching overlapping mutagenic primers, according to an algo-
rithm based on years of manual design experience in our lab. The
script searches and lists the overlapping mutagenic primers by
preset rules according to corresponding parameters, facilitating
the selection of overlapping mutagenic primers and the design of
the whole gene primers.

2 Materials

2.1 Running 1. Python 3.7.3 or higher including IDLE (Integrated Develop-


Environment ment and Learning Environment) [7].
2. Biopython 1.7.3 [8] (see Note 1).
3. The supplied Python script, Overlap_mutagenic_pri-
mer_v1.py (see Note 2). Files can be downloaded from:
https://github.com/shiqiang-lin/sdm-mutagenesis
4. Codon preference table in plain txt format [9] (see Note 2).

2.2 Input Files 1. Gene sequence file in plain text format.


2. Mutation sites file in plain text format.
The example gene we use in this study is the SMCR8 from
Homo sapiens with GeneID:140775, which was expressed in the
insect cell-based protein expression system [10]. The mutation sites
used as example are R147A, T862A, T862R, L867A, and L867R,
which must be input as one mutation site per line and stored in a
plain text file.

2.3 Algorithm A schematic view of the algorithm is shown in Fig. 2. Based on our
experience, the asymmetric primers with 30 overhangs balance the
requirements of amplifying the upstream fragment, the down-
stream fragment, and overlapping the two fragments in the third
PCR run. The parameters defining the searching ranges are shown
in Fig. 2. As shown in the figure, the range of i is from 0 to 3;
therefore, there are four possible lengths for the upstream flanking
region of the forward overlapping mutagenic primer; the range of
j is also from 0 to 3 so that there are four possible lengths for the
downstream flanking region of the forward overlapping mutagenic
primer. Thus, there are 4  4 ¼ 16 possible lengths for the forward
overlapping mutagenic primer. Likewise, the number of the possi-
ble reverse overlapping mutagenic primers is also 16. Taking both
the forward overlapping mutagenic primer and the reverse over-
lapping mutagenic primer into account, the number of all possible
primer pairs is 16  16 ¼ 256. The program lists the parameters for
each pair of primers with regard to the sequence, length, GC
percentage, melting temperature, and so on.
4 Xiaofang Huang et al.

Fig. 2 Asymmetric primers with overhangs for overlap extension PCR for mutagenesis. The forward over-
lapping mutagenic primer is shown with a thick blue arrow consisting of the mutagenic codon (purple
triangles), with flanking 9 + i nucleotides and 15 + j nucleotides at the N- and C-terminus respectively.
The reverse overlapping mutagenic primer is shown with a thick brown arrow consisting of the mutagenic
codon (green triangles), with flanking regions of 9 + k nucleotides and 15 + l nucleotides. The values i, j, k,
and l represent the lengths of the flanking regions within the primers, which define the primers’ lengths to be
searched by the program. The ranges of i, j, k, and l are all between 0 and 3 nucleotides (including three
nucleotides). The overlapping region between the two primers, critical for bridging the two mutated fragments
in generating the full-length mutagenic gene, is shown with brown dashes

3 Methods

1. Make a new directory and copy the Python script, codon


preference table file, prepared input gene file, and input muta-
tion sites file to the newly made directory.
2. Open a terminal window or command prompt, cd to the
directory made in step 1 and run the following command:

python3.7 Overlap_mutagenic_primer_v1.py input_gene.txt in-


put_mutaa.txt Insect

Python3.7 is used to run the script Overlap_mutagen-


ic_primer_v1.py. The gene filename and mutation sites
filename should follow the script. The final parameter (here,
Insect) specified the codon preference table of the target host
cell. Please see Note 3 for details on all available codon prefer-
ence tables.
3. When the run is finished, please return to the directory men-
tioned in step 1. The script produces a new folder with the
name input_gene_primers, which contains five text files
(in this example, L867A$.txt, L867R$.txt, R147A$.txt,
T862A$.txt, T862R$.txt) each corresponding to one input
mutation.
4. The partial result of one example output file, L867A$.txt, is
shown in Fig. 3. There are six columns in the output file
showing the name, sequence, length, GC percentage, the Tm
Programmatic Designing of Primers for Protein Mutagenesis 5

primer sequence length GC Tm Overlap_Tm

L867A_overlap_5 TGCCACCACGCTCACCTGCCTACCCAC 27 66.67 67.59 62.36

L867A_overlap_3 TAGGCAGGTGAGCGTGGTGGCAGAAGGT 28 60.71 66.49 62.36

L867A_overlap_5 TGCCACCACGCTCACCTGCCTACCCAC 27 66.67 67.59 62.36

L867A_overlap_3 TAGGCAGGTGAGCGTGGTGGCAGAAGGTG 29 62.07 66.89 62.36

……

L867A_overlap_5 TTCTGCCACCACGCTCACCTGCCTACCCAC 30 63.33 68.28 63.81

L867A_overlap_3 TAGGCAGGTGAGCGTGGTGGCAGAAGGTG 29 62.07 66.89 63.81

L867A_overlap_5 TTCTGCCACCACGCTCACCTGCCTACCCAC 30 63.33 68.28 63.81

L867A_overlap_3 TAGGCAGGTGAGCGTGGTGGCAGAAGGTGT 30 60.0 67.68 63.81

……

Fig. 3 An example of the partial contents of an output file

value of each primer, and the Tm value of the overlapping


region for each pair of overlapping primers. There are
4  4  4  4 ¼ 256 pairs of primers, which are ranked
according to increasing Overlap_Tm value.
5. From the overall view of all Tm and Overlap_Tm values, the
gene primers can be chosen manually according to these values
and the specific user requirements (see Note 4). Moreover,
users can add additional nucleotides to the gene primers, such
as restriction enzyme sites and flanking bases [11], or adapter
sequences for seamless cloning [12].

4 Notes

1. The required Python dependencies can be installed through


the following command, which will automatically install the
required biopython 1.7.3 module:

sudo pip3.7 install biopython==1.73

2. The codon preference table file is provided in the Supplemen-


tary Information. The species included are E. coli, yeast, insect,
C. elegans, D. melanogaster, human, mouse, rat, pig, P. pastoris,
A. thaliana, Streptomyces, Zea mays, tobacco, S. cerevisiae, and
Cricetulus griseus (CHO). The specific data format of the
codon preference tables can be seen by consulting the file.
Note that only one species can be specified in a given run.
3. To ensure that the upstream fragment and the downstream
fragment can be efficiently purified by gel extraction, the
6 Xiaofang Huang et al.

fragment lengths should be larger than 200 bps. Thus, the


position of target mutant residue must be at least 65 residues
after the first residue and at least 65 residues before the last
residue. If the mutation site is near the N-terminus or
C-terminus of the protein, for example, within the range of
the first or last 35 residues, the mutagenic primers can be
directly synthesized and used to amplify the full-length mutant
gene. In cases where the desired mutation site is located
between 36 and 65 residues from the N-terminus or
C-terminus of the protein, which may result in primers being
too long for direct synthesis and too short for gel extraction,
users can generate the mutated gene using whole plasmid PCR
[13]. To ensure that the vector backbone remains unchanged
following the PCR process, the user can further amplify the
mutated gene using a plasmid containing the mutated gene
produced by the whole plasmid PCR as a template and put the
mutated gene into the wild-type vector.
4. In the third PCR reaction, the two DNA strands must overlap
so that one is able to use the other as a template to generate the
full-length mutant gene by DNA polymerase. Thus, the
annealing temperate should be set according to the lowest
among the three values to facilitate bridging.

Acknowledgments

This work was supported by the National Key R&D Program of


China grant 2017YFA0506300 (Q.S.), and NSFC grants
81671388 (Q. S.).

References
1. Xiao YH, Pei Y (2011) Asymmetric overlap www.bioinformatics.org/primerx/. Accessed
extension PCR method for site-directed muta- 14 May 2020
genesis. Methods Mol Biol 687:277–282 6. O’Halloran DM, Uriagereka-Herburger I,
2. Wang H, Zhou N, Ding F, Li Z, Chen R, Bode K (2017) STITCHER 2.0: primer design
Han A, Liu R (2011) An efficient approach for overlapping PCR applications. Sci Rep:7:
for site-directed mutagenesis using central 45349
overlapping primers. Anal Biochem 418:304– 7. Python. https://www.python.org/. Accessed
306 14 May 2020
3. Primer Premier A comprehensive PCR primer 8. Cock PJ, Antao T, Chang JT, Chapman BA,
design software. http://www.premierbiosoft. Cox CJ, Dalke A, Friedberg I, Hamelryck T,
com/primerdesign/index.html. Accessed Kauff F, Wilczynski B, de Hoon MJ (2009)
14 May 2020 Biopython: freely available Python tools for
4. Untergasser A, Cutcutache I, Koressaar T, Ye J, computational molecular biology and bioinfor-
Faircloth BC, Remm M, Rozen SG (2012) matics. Bioinformatics 25:1422–1423
Primer3—new capabilities and interfaces. 9. GenScript codon usage frequency table (chart)
Nucleic Acids Res 40:e115 tool. https://www.genscript.com/tools/
5. PrimerX. Automated design of mutagenic pri- codon-frequency-table. Accessed 14 May 2020
mers for site-directed mutagenesis. http://
Programmatic Designing of Primers for Protein Mutagenesis 7

10. Tang D, Sheng J, Xu L, Zhan X, Liu J, Jiang H, 12. Okegawa Y, Motohashi K (2015) A simple and
Shu X, Liu X, Zhang T, Jiang L, Zhou C, Li W, ultra-low cost homemade seamless ligation
Cheng W, Li Z, Wang K, Lu K, Yan C, Qi S cloning extract (SLiCE) as an alternative to a
(2020) Cryo-EM structure of C9ORF72- commercially available seamless DNA cloning
SMCR8-WDR41 reveals the role as a GAP for kit. Biochem Biophys Rep 4:148–151
Rab8a and Rab11a. Proc Natl Acad Sci U S A 13. Bi C, Huang X, Tang D, Shi Y, Zhou L, Hu Y,
117:9876–9883 Chen X, Qi S, Lin S (2020) A python script to
11. Cleavage close to the end of DNA fragments. design site-directed mutagenesis primers. Pro-
https://www.neb.com/tools-and-resources/ tein Sci 29:1054–1059
usage-guidelines/cleavage-close-to-the-end-
of-dna-fragments. Accessed 14 May 2020
Chapter 2

Recombination of Single Beneficial Substitutions Obtained


from Protein Engineering by Computer-Assisted
Recombination (CompassR)
Haiyang Cui, Mehdi D. Davari, and Ulrich Schwaneberg

Abstract
A large number of beneficial substitutions can be obtained from a successful directed enzyme evolution
campaign and/or (semi)rational design. It is expected that the recombination of some beneficial substitu-
tions leads to a much higher degree of performance through synergistic effect. However, systematic
recombination studies show that poorly performing variants are often obtained after recombination of
three to four individual beneficial substitutions and this limits protein engineers to exploit nature’s potential
in generating better performing enzymes. Computer-assisted Recombination (CompassR) strategy allows
the recombination of identified beneficial substitutions in an effective and efficient manner in order to
generate active enzymes with improved performance. Here, we describe in detail the CompassR procedure
with an example of recombining four substitutions and discuss some important practical issues that should
be considered (such as the selection of protein structures, number of FoldX runs, evaluation of calculations)
for application of the CompassR rule. The core part of this protocol (system setup, ΔΔGfold calculation, and
CompassR application) is transferable to other enzymes and any recombination of single beneficial
substitutions.

Key words Protein engineering, Directed evolution, Single substitution, Recombination, The relative
free energy of folding (ΔΔGfold), FoldX

1 Introduction

Beneficial substitutions of amino acid residues in enzyme sequences


could be obtained from directed evolution experiments after
screening a few thousand variants derived from random mutagene-
sis or by (semi)rational design studies [1, 2]. Enzyme variants with a
much higher degree of performance can be achieved through the
recombination of those beneficial substitutions. However, numer-
ous reports pointed out that recombining more than two or three
beneficial substitutions do not necessarily yield further improved
enzyme variants [3]. Several studies also indicated that beneficial
substitutions drive each other to “extinction” [1, 4–7]. In addition,

Andrew Currin and Neil Swainston (eds.), Directed Evolution: Methods and Protocols, Methods in Molecular Biology, vol. 2461,
https://doi.org/10.1007/978-1-0716-2152-3_2, © Springer Science+Business Media, LLC, part of Springer Nature 2022

9
10 Haiyang Cui et al.

simultaneous site saturation mutagenesis (SSM) at several positions


often lead to a dramatic decrease in the active population of recom-
binants (0.67–16.55%) [1]. Therefore, in silico methods to guide
recombination experiments are highly desired to ensure a high
fraction of active recombinants and better performing enzyme
variants simultaneously.
The CompassR strategy guides the recombination of beneficial
substitutions through analysis of the relative free energy of folding
(ΔΔGfold ¼ ΔGfold,sub  ΔGfold,wt). ΔΔGfold has been employed as a
measure of protein stability and to assess the relationship between
stability and function in several enzymes (e.g., TEM-1 β-lactamase
[8–11], cytochrome P450 BM3 [12], green fluorescent protein
avGFP [13], and others [14–17]). As enzymes must be able to
fold stably in order to function properly, ΔΔGfold provides a pre-
requisite to ensure the recombinant’s active and a means to predict
highly stable combinatorial variants [1]. The variants with higher
stability tend to accept a broader range of beneficial substitutions
until the “robustness threshold” is reached [9, 11]. Bacillus subtilis
Lipase A (BSLA), a well-studied enzyme, was chosen to develop
CompassR as a predictor for recombining beneficial substitutions
[1]. To confirm BSLA can be used as a general enzyme model, a
standard staggered extension process (StEP) recombination exper-
iment was performed to recombine 39 BLSA beneficial substitu-
tions (39 ¼ 13 positions  3 substitutions) in three genes. These
39 substitutions were identified in the “BSLA-SSM” library
[18]. The analysis of the StEP recombination library demonstrated
that BSLA faces the similar recombination challenge as with other
reported enzymes (e.g., Pseudomonas aeruginosa lipase [3], β-glu-
curonidase [5], PAMO [19], cpADH5 [20], LEH [21]). To
develop CompassR, among 39 substitutions, 13 beneficial substi-
tutions located at 13 amino acid positions were finally selected
based on their ΔΔGfold values (calculated by the FoldX method
[22]) for recombination studies. They were placed into three cate-
gories according to the calculated ΔΔGfold values. The three most
stabilizing substitutions (F17S, V54K, and G155P) were selected
for two recombination campaigns (“intracategory” and “intercate-
gory”) with all other substitutions and up to four subsequent
recombination experiments were performed (Fig. 1). Based on
the obtained results of a total 84 BSLA recombinants toward
their active/inactive population and representative ΔΔGfold values
(Fig. 1), the following thresholds of CompassR are used to catego-
rize substitutions; category A: “active recombinants” (substitutions
with ΔΔGfold  +0.36 kcal/mol), category B: “recombinants with
unpredictable activity” (substitutions within
+0.36 < ΔΔGfold < +7.52 kcal/mol) or category C: “deactivating
recombinants” (ΔΔGfold  +7.52 kcal/mol). CompassR expects
that all recombinants in category A (ΔΔGfold  +0.36 kcal/mol)
were active, and improvements gradually increased with increasing
Recombination of Single Beneficial Substitutions by CompassR 11

Fig. 1 Overview of all BSLA recombinants generated in the recombination of each category (“intra-category”)
and the beneficial substitutions F17S, V54K, and G155P with beneficial substitutions from categories A (light
green), B (light blue), and C (light purple) (“intercategory”). Categories (A, B, and C; on the left) are composed
of 13 selected beneficial substitutions obtained from the BSLA-SSM library and grouped according to their
ΔΔGfold values. Notations of recombinants: dark green: residual activity (in buffer)  80% of the BSLA wild
type activity. Orange: residual activity (in buffer) between 10–80% of the BSLA wild type activity. Red: residual
activity (in buffer) is between 0 and 10% of the BSLA wild type activity and referred to as “inactive”
recombinant

the number of recombined beneficial substitutions. Recombination


with beneficial substitutions in category C should be avoided.
Recombination with beneficial positions in category B should be
considered as a backup only if few beneficial substitutions are
identified after recombining all beneficial substitutions from cate-
gory A [1]. Summarily, the CompassR rule (Fig. 2) guides experi-
mentalists on how to generate highly active recombination libraries
based on ΔΔGfold values, and enables the design of enzymes with
better properties (catalytic activity, stability, and resistance to harsh
environments).
CompassR possesses broad applicability in combination with
many protein engineering strategies. It has strong potential to
speed up the identification of significantly improved enzymes
using CASTing [23], KnowVolution [7], or MORPHING
[24]. In addition, CompassR can also be implemented as preselec-
tor in gene recombination experiments (e.g., gene shuffling [25]
and StEP [26]) or in rationality-guided methods like SCHEMA
[27–30] or PTRec [31].
12 Haiyang Cui et al.

Fig. 2 Computer-assisted Recombination (CompassR) rule workflow. Preparation: The initial enzyme structure
and substitutions list are needed, and FoldX plugin for YASARA need to be downloaded and installed. Step 1:
Load the PDB file of wild type enzyme. Step 2: Select the target substitution for calculation of ΔΔGfold. Step 3:
Set parameters for FoldX. Step 4: Collection of ΔΔGfold results and alignment with CompassR rule; Step 5:
Make the recombination plan for beneficial substitutions

2 Material

In order to calculate the value of ΔΔGfold required for the applica-


tion of CompassR, one wild-type protein structure (e.g.,
1i6w_A_noSOL.pdb) and all software (FoldX and YASARA Struc-
ture) under operating systems (e.g., Microsoft Windows 10)
should be obtained (see Note 1).

2.1 The Initial 1. The initial structure of the BSLA wild-type crystal structure
Enzyme Structure and (as an example we use PDB ID 1i6w [47] Chain A, resolution
Substitution List 1.5 Å) is taken from Protein Data Bank (www.rcsb.org) (see
Note 2).
2. The PDB file is renamed after removing water molecules and
other ligands, for example, 1i6w_A_noSOL.pdb (see Note 3).
3. A list of beneficial substitutions, selecting for further recombi-
nation experiments, should be prepared (e.g., F17S, D64N,
G104Q, V165E). These substitutions will be filtered by Com-
passR rule after calculating their ΔΔGfold value (see Note 4).

2.2 Setup of the 1. Download and install the YASARA Structure version 16.7.22
FoldX Plugin for (http://www.yasara.org) (see Note 5).
YASARA 2. After free and simple registration for academic users on FoldX
website http://foldxsuite.crg.eu/, download your system-
specific FoldX Suite 4.0 (a zip file containing foldx.exe, rota-
base.txt) and FoldX plugin (a zip file named as yasaraPlugin.zip
containing several .py files and others) for YASARA (see
Note 1).
Recombination of Single Beneficial Substitutions by CompassR 13

3. Installation of FoldX is straightforward, using the following


steps:
(a) Go to the YASARA’s installation folder, and find the
directory named “plg” (abbreviation for plugin). The
path should be like. \yasara\plg.
(b) Extract all files from the YASARA plugin’s zipped file and
copy them directly into the “plg” folder.
(c) Open YASARA, load any .pdb files to access to the
“Analyze” menu.
(d) Go to Analyze > FoldX > ConfigurePlugin.
(e) Select the foldx.exe file in the first browser window.
(f) Select the rotabase.txt file in the second browser window
(see Note 6).

3 Methods

All the procedures are carried out using Microsoft Windows


operating system. The following five steps are also illustrated in
Fig. 2, and enable to build up recombination variants by adding the
potential substitutions in an iterative manner.

3.1 Load the PDB File 1. Open YASARA Structure software (see Note 7).
of Wild Type Enzyme 2. Go to File > Load > PDB file.
(Step 1)
3. Look for the PDB file (i.e., 1i6w_A_noSOL.pdb) and click
“OK” (see Note 8).

3.2 Select the Target 1. Go to Analyze > FoldX > Mutate residue.
Substitution for 2. Select a residue for FoldX analysis in the sequence menu by
Calculation of the double-clicking, e.g., Phe17.
Relative Folding Free
3. In the window of “select FoldX routines,” activate the option “FoldX
Energy (ΔΔGfold, RepairPDB” and “Calculate stability change” for calculating the
Step 2) relative folding free energies (ΔΔGfold ¼ ΔGfold,sub  ΔGfold,wt)
and click “OK” to go to the next step (see Note 9).
4. Select new amino acid residue, for example, Ser, and click
“OK.”

3.3 Set Parameters 1. Keep the default option in the window of “Set FoldX option
for FoldX (Step 3) (1)”, which means only “Move neighbors” option is activated,
and click “OK” (see Note 10).
2. Set parameters: Number of runs: 5; Temperature (K): 298; pH
to 7; Ionic strength (100): 5; VdW design: 2 (see Note 11).
3. Click “OK,” then FoldX starts to run with an additional FoldX
program terminal appeared.
14 Haiyang Cui et al.

3.4 Collection of 1. Collect the ΔΔGfold values (kcal/mol) of five runs from the
ΔΔGfold Results and YASARA console after the termination of the FoldX calcula-
Alignment with the tion. After pressing the space key, the results show in YASARA
CompassR Rule console as follows (see Note 12).
(Step 4)
Plugin>nameObj | total_energy
Plugin>-------------------------+-------------
Plugin>FA17S:Object1_Repair_1_0 | -0.0153279
Plugin>FA17S:Object1_Repair_1_1 | -0.0161321
Plugin>FA17S:Object1_Repair_1_2 | -0.0263127
Plugin>FA17S:Object1_Repair_1_3 | 0.00703162

Plugin>FA17S:Object1_Repair_1_4 | 0.0459613

Plugin>FA17S:Object1_Repair_1_4 | 0.0459613
Plugin>nameObj | total_energy
Plugin>-----------------------+-------------
Plugin>FA17S:Object1_Repair_1 | -0.00095594

2. Select the minimum ΔΔGfold value as the final result of the


specific substitution (e.g., ΔΔGfold, F17S ¼ 0.03 kcal/mol)
(see Note 13).
3. According to the CompassR rule and the defined ΔΔGfold value
of substitution, place the substitution into the corresponding
category: category A: ΔΔGfold  +0.36 kcal/mol; category B:
+0.36 < ΔΔGfold < +7.52 kcal/mol; category C:
ΔΔGfold  +7.52 kcal/mol.
4. Repeat the above steps to calculate all ΔΔGfold values of the
remaining three single substitutions (e.g., D64N, G104Q,
V165E).

3.5 Create a 1. When single substitutions with ΔΔGfold  +0.36 kcal/mol are
Recombination Plan recombined (e.g., ΔΔGfold, F17S ¼ 0.03 kcal/mol, ΔΔGfold,
for Beneficial D64N ¼ +0.09 kcal/mol), one can expect active recombinants
Substitutions (Step 5) of improved properties (green), e.g., F17S-D64N (see upper
part in Fig. 1).
2. When beneficial substitutions are recombined with ΔΔGfold
values ranging from +0.36 to +7.52 kcal/mol (e.g., ΔΔGfold,
V165E ¼ +4.89 kcal/mol), one cannot predict whether the
recombinants are inactive or active (unpredictable behaviors;
orange), for example, F17S-V165E and others (see middle part
in Fig. 1).
3. Recombination of beneficial substitutions with
ΔΔGfold  +7.52 kcal/mol (e.g., ΔΔGfold,
G104Q ¼ +14.38 kcal/mol) results in activity-reduced even
Recombination of Single Beneficial Substitutions by CompassR 15

deactivated recombinants (red, e.g., F17S-G104Q, see lower


part in Fig. 1), which would be discarded from the next round
of recombination (see Note 14).

4 Notes

1. A standard PC (e.g., intel® Core™ i5 CPU 2.50 GHz; RAM


8 GB) is required to install YASARA Structure and FoldX. In
Microsoft Windows, the FoldX executable only support
32-bits, but 32-bits applications is able to run using 64-bit
systems. If you want to uninstall FoldX, you can directly delete
the foldx.exe and rotabase.txt because FoldX is a self-contained
binary. The FoldX plugin is written in Python and supported
on Linux, MacOSX, and Microsoft Windows. In addition, the
standalone FoldX can also be used without YASARA, and it is
compatible with other programming languages like python,
shell, Java, and so on. An updated version of FoldX is accept-
able and recommended. YASARA Structure is a molecular
graphics, modeling and simulation program for Linux, Win-
dows and MacOS.
2. If there are several available X-ray or NMR structure, or the
protein of interest is a dimer, instead of a monomer, the one of
highest near-atomic resolution would be preferred. As FoldX is
sensitive to protein structure, higher resolution crystal struc-
tures of proteins (better than 3.3 Å) will reinforce the better
performance of FoldX in predicting stability trends and quan-
titative accuracy. If no X-ray or NMR structure is available, a
homology model of the structure has to be built using homol-
ogy modeling tools such as YASARA [32], I-Tasser [33],
Phyre2 [34], and Rosetta [35], but the accuracy of the homol-
ogy model structure will negatively affect the reliability of
ΔΔGfold calculation and even CompassR prediction.
3. All water molecules and ligands in the PDB file should be
removed via text editors (e.g., Notepad++, WordPad, BBEdit)
or PDB visualization software (e.g., YASARA, Pymol, Discov-
ery Studio Visualizer).
4. Beneficial substitutions can be obtained from directed evolu-
tion and/or (semi)-rational design.
5. An updated version of YASARA is recommended. “YASARA
structure” is the highest stage of YASARA, which requires a
license fee. Fortunately, the free initial stage “YASARA View”
can completely meet the requirements of CompassR, and can
be downloaded from http://www.yasara.org/viewdl.htm.
Besides, other stages of YASARA (e.g., Model, Dynamics,
Twinset) are also applicable. Python 2.X or Python 3.X is
16 Haiyang Cui et al.

needed to use the following FoldX plugin in YASARA. Open


YASARA and click Help > Install > Python. If this option is
not clickable, Python is already installed. Python is installed by
default on Linux and MacOS machines. Otherwise, it can be
downloaded from http://www.python.org/download.
6. Spaces should be avoided in the location of either foldx.exe,
rotabase.txt, or installed YASARA software.
7. In YASARA, the actions can be taken not only on the graphical
user interface but also by the YASARA command, which can be
typed directly in the YASARA console. To make sure the fresh
startup state, we recommend the user to clear all objects and
bitmaps for a fresh start or reset YASARA entirely.
8. YASARA command “LoadPDB 1i6w_A_noSOL.pdb” is also
available to load the PDB structure. Structures can only be
provided as PDB format. Naming a molecule or object as a
specific number should be avoided.
9. The structure of BSLA wild type was rotamerized and energy
minimized using the “RepairObject” command to correct the
residues that have nonstandard torsion angles and other fre-
quent problems in PDB structures, like steric clashes. Some
side chains moved slightly during “RepairObject.” The
repaired structure is loaded as one new YASARA object and
superposed with the original structure. Any energy calculation
using FoldX requires minimizing the structure in advance to
get reliable results.
10. The “Move neighbours” option is on default, which allows
neighboring residues around the substitution site (<6 Å) to
be moved to accommodate the new substituted residue. The
moved neighboring residues could be visualized on the graph-
ical interface after finishing the whole process. If you do not
want these neighbours to move, please untick the option.
11. Five FoldX runs were performed for each substitution to
ensure that the minimum energy conformation of even large
residues that possess many rotamers are generated. However, a
high number of runs (max. 5) will take more time for the
process to finish the mutagenesis. We prefer to set parameters
(e.g., temperature, pH) in accordance with the experimental
condition. It takes several seconds to 1–2 min to generate a
single substituted structure, depending on the size of amino
acid residues.
12. The file-saving path can be Analyze > FoldX > save last calcu-
lation. The difference in free energy (ΔΔGfold) is given by
ΔGsub  ΔGwt. In the file “Raw_Object1_Repair.fxout”, you
can retrieve the energy of the five runs for both WT and
substitution. It should be realized that the absolute values of
ΔGwt and ΔGsub are meaningless because they do not approach
Recombination of Single Beneficial Substitutions by CompassR 17

the experimental values. If the length of the amino-acid


sequence in the PDB structure is not zero-based, the number
of the substitutions position in the result will be shifted by the
number of missing amino acids.
13. The larger the ΔΔGfold negative values are, the higher the
stability is. Generally, a substitution with ΔΔGfold > 0 kcal/
mol will destabilize the structure, while substitution with
ΔΔG < 0 kcal/mol will stabilize the structure. A usual thresh-
old is that if |ΔΔGfold| is >1 kcal/mol, a substitution would
have a significant effect, which roughly equivalent to the
energy of a hydrogen bond.
14. CompassR highly increases the cooperative and additive effect
among either isolated amino acid positions or neighbored
ones. The standard deviation of the FoldX method in the
prediction of ΔΔGfold is reported as 0.46 kcal/mol [1]. Besides,
the thermodynamic stability varies from protein to protein,
which relies on the kinetic folding/unfolding processes, equi-
librium stability, and the temperature at which assays are con-
ducted [1]. Thereby, the exact CompassR thresholds might
slightly change depending on the type of protein and its fold.
Besides, the CompassR thresholds can only be used for predic-
tion of single substitutions from wild-type enzyme. Regarding
the recombination candidates with double or multiple substi-
tutions, extra experiments might be required to validate the
accuracy and reliability of the current thresholds.

References
1. Cui H, Cao H, Cai H, Jaeger K-E, Davari MD, 5. Rowe LA, Geddie ML, Alexander OB, Matsu-
Schwaneberg U (2020) Computer-assisted mura I (2003) A comparison of directed evolu-
recombination (CompassR) teaches us how to tion approaches using the β-glucuronidase
recombine beneficial substitutions from model system. J Mol Biol 332(4):851–860
directed evolution campaigns. Chem Eur J 6. Bloom JD, Meyer MM, Meinhold P, Otey CR,
26(3):643–649. https://doi.org/10.1002/ MacMillan D, Arnold FH (2005) Evolving
chem.201903994 strategies for enzyme engineering. Curr Opin
2. Bornscheuer UT, Hauer B, Jaeger KE, Schwa- Chem Biol 15(4):447–452
neberg U (2019) Directed evolution empow- 7. Rübsam K, Davari MD, Jakob F, Schwaneberg
ered redesign of natural proteins for the U (2018) KnowVolution of the polymer-
sustainable production of chemicals and phar- binding peptide LCI for improved polypropyl-
maceuticals. Angew Chem Int Ed 58(1): ene binding. Polymers 10(4):423
36–40. https://doi.org/10.1002/anie. 8. Tokuriki N, Tawfik DS (2009) Stability effects
201812717 of mutations and protein evolvability. Curr
3. Liebeton K, Zonta A, Schimossek K, Opin Struct Biol 19(5):596–604
Nardini M, Lang D, Dijkstra BW, Reetz MT, 9. Bershtein S, Segal M, Bekerman R, Tokuriki N,
Jaeger KE (2000) Directed evolution of an Tawfik DS (2006) Robustness–epistasis link
enantioselective lipase. Chem Biol 7(9): shapes the fitness landscape of a randomly drift-
709–718 ing protein. Nature 444(7121):929–932.
4. Bhuiya M-W, Liu C-J (2010) Engineering https://doi.org/10.1038/nature05385
monolignol 4-O-methyltransferases to modu- 10. Firnberg E, Labonte JW, Gray JJ, Ostermeier
late lignin biosynthesis. J Biol Chem 285(1): M (2014) A comprehensive, high-resolution
277–285 map of a gene’s fitness landscape. Mol Biol
Evol 31(6):1581–1592
18 Haiyang Cui et al.

11. Soskine M, Tawfik DS (2010) Mutational substrate acceptance of enzymes: combinato-


effects and the evolution of new protein func- rial active-site saturation test. Angew Chem
tions. Nat Rev Genet 11(8):572 Int Ed 117(27):4264–4268
12. Bloom JD, Labthavikul ST, Otey CR, Arnold 24. Gonzalez-Perez D, Molina-Espeja P, Garcia-
FH (2006) Protein stability promotes evolva- Ruiz E, Alcalde M (2014) Mutagenic
bility. Proc Natl Acad Sci U S A 103(15): organized recombination process by homolo-
5869–5874 gous in vivo grouping (MORPHING) for
13. Sarkisyan KS, Bolotin DA, Meer MV, Usma- directed enzyme evolution. PLoS One 9(3):
nova DR, Mishin AS, Sharonov GV, Ivankov e90919
DN, Bozhanova NG, Baranov MS, Soylemez 25. Stemmer WP (1994) Rapid evolution of a pro-
O (2016) Local fitness landscape of the green tein in vitro by DNA shuffling. Nature
fluorescent protein. Nature 533(7603):397 370(6488):389
14. Studer RA, Christin PA, Williams MA, Orengo 26. Zhao H, Zha W (2006) In vitro ‘sexual’ evolu-
CA (2014) Stability-activity tradeoffs constrain tion through the PCR-based staggered exten-
the adaptive evolution of RubisCO. Proc Natl sion process (StEP). Nat Protoc 1(4):
Acad Sci U S A 111(6):2223–2228. https:// 1865–1871
doi.org/10.1073/pnas.1310811111 27. Mateljak I, Rice A, Yang K, Tron T, Alcalde M
15. Tokuriki N, Stricher F, Serrano L, Tawfik DS (2019) The generation of thermostable fungal
(2008) How protein stability and new func- laccase chimeras by SCHEMA-RASPP struc-
tions trade off. PLoS Comput Biol 4(2): ture-guided recombination in vivo. ACS
e1000002 Synth Biol 8(4):833–843. https://doi.org/
16. Yu H, Yan Y, Zhang C, Dalby PA (2017) Two 10.1021/acssynbio.8b00509
strategies to engineer flexible loops for 28. Otey CR, Landwehr M, Endelman JB,
improved enzyme thermostability. Sci Rep 7: Hiraga K, Bloom JD, Arnold FH (2006)
41212 Structure-guided recombination creates an
17. Chen C, Lin J, Chu Y (2013) iStable: off-the- artificial family of cytochromes P450. PLoS
shelf predictor integration for predicting pro- Biol 4(5):e112
tein stability changes. BMC Bioinformatics 29. Meyer MM, Hochrein L, Arnold FH (2006)
14(2):S5. https://doi.org/10.1186/1471- Structure-guided SCHEMA recombination of
2105-14-s2-s5 distantly related β-lactamases. Protein Eng Des
18. Frauenkron-Machedjou VJ, Fulton A, Zhao J, Sel 19(12):563–570
Weber L, Jaeger K-E, Schwaneberg U, Zhu L 30. Voigt CA, Martinez C, Wang Z-G, Mayo SL,
(2018) Exploring the full natural diversity of Arnold FH (2002) Protein building blocks pre-
single amino acid exchange reveals that served by recombination. Nat Struct Mol Biol
40–60% of BSLA positions improve organic 9(7):553–558. https://doi.org/10.1038/
solvents resistance. Biores Bioprocess 5(1):2 nsb805
19. Parra LP, Agudo R, Reetz MT (2013) Directed 31. Marienhagen J, Dennig A, Schwaneberg U
evolution by using iterative saturation muta- (2012) Phosphorothioate-based DNA recom-
genesis based on multiresidue sites. Chembio- bination: an enzyme-free method for the com-
chem 14(17):2301–2309 binatorial assembly of multiple DNA
20. Ensari Y, Dhoke GV, Davari MD, Ruff AJ, fragments. BioTechniques 52(5):1–6. https://
Schwaneberg U (2018) A comparative reengi- doi.org/10.2144/000113865
neering study of cpADH5 through iterative 32. Land H, Humble MS (2018) YASARA: a tool
and simultaneous multisite saturation muta- to obtain structural guidance in biocatalytic
genesis. Chembiochem 19(14):1563–1569 investigations. In: Protein engineering.
21. Sun Z, Lonsdale R, Kong XD, Xu JH, Zhou J, Springer, pp 43–67
Reetz MT (2015) Reshaping an enzyme bind- 33. Zhang Y (2008) I-TASSER server for protein
ing pocket for enhanced and inverted stereo- 3D structure prediction. BMC Bioinformatics
selectivity: use of smallest amino acid alphabets 9(1):40
in directed evolution. Angew Chem Int Ed 34. Kelley LA, Mezulis S, Yates CM, Wass MN,
54(42):12410–12415 Sternberg MJ (2015) The Phyre2 web portal
22. Schymkowitz J, Borg J, Stricher F, Nys R, for protein modeling, prediction and analysis.
Rousseau F, Serrano L (2005) The FoldX web Nat Protoc 10(6):845
server: an online force field. Nucleic Acids Res 35. Rohl CA, Strauss CE, Misura KM, Baker D
33(suppl_2):W382–W388 (2004) Protein structure prediction using
23. Reetz MT, Bocola M, Carballeira JD, Zha D, Rosetta. In: Methods enzymol, vol 383. Else-
Vogel A (2005) Expanding the range of vier, pp 66–93
Chapter 3

Nondegenerate Saturation Mutagenesis: Library


Construction and Analysis via MAX and ProxiMAX
Randomization
Anupama Chembath, Ben P. G. Wagstaffe, Mohammed Ashraf,
Marta M. Ferreira Amaral, Laura Frigotto, and Anna V. Hine

Abstract
Protein engineering can enhance desirable features and improve performance outside of the natural
context. Several strategies have been adopted over the years for gene diversification, and engineering of
modular proteins in particular is most effective when a high-throughput, library-based approach is
employed. Nondegenerate saturation mutagenesis plays a dynamic role in engineering proteins by targeting
multiple codons to generate massively diverse gene libraries. Herein, we describe the nondegenerate
saturation mutagenesis techniques that we have developed for contiguous (ProxiMAX) and noncontiguous
(MAX) randomized codon generation to create precisely defined, diverse gene libraries, in the context of
other fully nondegenerate strategies. ProxiMAX randomization comprises saturation cycling with repeated
cycles of blunt-ended ligation, type IIS restriction, and PCR amplification, and is now a commercially
automated process predominantly used for antibody library generation. MAX randomization encompasses
a manual process of selective hybridisation between individual custom oligonucleotide mixes and a conven-
tionally randomized template and is principally employed in the research laboratory setting, to engineer
alpha helical proteins and active sites of enzymes. DNA libraries generated using either technology create
high-throughput amino acid substitutions via codon randomization, to generate genetically diverse clones.

Key words Nondegenerate, Saturation mutagenesis, Randomized gene libraries, Codon randomiza-
tion, Protein engineering, Library design, Genetic code, Amino acids, Genetic diversity,
Oligonucleotides

1 Introduction

Directed evolution utilizes nature’s biosynthetic machinery, via


protein engineering, in order to generate new or improved proteins
with preferred properties [1]. Protein engineering has emerged as a
powerful means for modifying the activities of macromolecules for
various research and industrial applications, and prospective fields

Anupama Chembath and Ben P. G. Wagstaffe contributed equally to this work.

Andrew Currin and Neil Swainston (eds.), Directed Evolution: Methods and Protocols, Methods in Molecular Biology, vol. 2461,
https://doi.org/10.1007/978-1-0716-2152-3_3, © Springer Science+Business Media, LLC, part of Springer Nature 2022

19
20 Anupama Chembath et al.

of application of these engineered protein resources extend over


several disciplines from chemical to pharmaceutical and agricultural
industries. Structured and precise mutagenesis and selection tech-
niques evolve the protein in a directed manner and allow research-
ers to exploit mutations in a targeted fashion enabling a
comprehensive investigation of a gene’s sequence space [2]. This
approach permits selection of variants with anticipated molecular
features. Variants are selected from pools of DNA libraries, which
are expressed and screened for desired properties, typically over
multiple cycles [3]. Beneficial mutations thus accumulate by testing
of hundreds of thousands of variants in each generation. Thus, it is
of critical significance to generate superior diverse gene libraries. A
wide-ranging selection of mutational methods and gene diversifica-
tion strategies exist, each with their own properties and utility [4].

1.1 Saturation Saturation mutagenesis (also known as site saturation mutagenesis


Mutagenesis or oligonucleotide-directed randomization) is a widely used ran-
dom mutagenesis technique in which a single codon or collection
of codons are identified based on their ability to accommodate
beneficial mutations. Traditionally, they were randomized with all
possible amino acids at those positions [5]. Basic saturation muta-
genesis uses the conventional degenerate codon NNN (N ¼ A/T/
G/C) in targeted positions, allowing for the incorporation of any of
the 64 codons. The genetic code is degenerate in nature, since the
20 canonical amino acids are encoded by 61 sense codons, resulting
in the same amino acid being coded by more than a single codon.
The degeneracy of the genetic code thus leads to an inherent bias
toward the representation of some codon groups in the DNA
library over others, in turn leading to disproportionate encoding
of amino acids. This redundancy can be reduced by the use of NNN
variants, for example, NNK, NNM (K ¼ G/T; M ¼ C/A), or
others depending on the requirements of the user. Even with the
reduced degeneracy, these traditional methods still have consider-
able problems with amino acid bias and incorporation of termina-
tion codons, particularly as the number of target codons increases.

1.2 Trinucleotide One of the earliest approaches to nondegenerate saturation muta-


Phosphoramidites genesis, trinucleotide phosphoramidite (TRIM) technology, relies
(TRIM) on the ability of predetermined trinucleotide phosphoramidites
(TPs), combined in a user-defined pool, to saturate codons. During
oligonucleotide synthesis, DNA extension is normally achieved by
adding a single base at a time in individual reactions, while TRIM is
capable of adding three bases at once.
Originally, based on the work by Holm in 1986 [6], less
common trinucleotides for each amino acid were excluded and
20 trinucleotides, each representing one amino acid, were gener-
ated from one of seven different dinucleotides (AT, CT, GT, TT,
AG, GG, TG) and two different mononucleotides (T or G). In
Nondegenerate Saturation Mutagenesis: Library Construction and Analysis. . . 21

order to prevent undesirable reactions during DNA synthesis, com-


ponents of the trinucleotide were protected using methyl groups
for the 50 phosphate esters and phenoxyacetyl for the 30 hydroxyl
groups [7]. Conditions for coupling of the TPs were similar to
those of mononucleotide couplings, with each of the 20 TPs used
in individual automated oligonucleotide synthesis experiments
using Applied Biosystem’s DNA synthesizer 380B. Coupling yields
for the TPs were enhanced via the introduction of increased cou-
pling times followed by a second round of coupling for each TP,
resulting in yields of 96–98.5% (mononucleotide yields being
98–99.5%) [7]. Subsequently, this original method of generating
TPs has been simplified by obtaining commercially manufactured
codons [7].
The success of the technique depends on generating prede-
fined, equimolar mixtures of TPs in which each component TP
encodes a single amino acid. That mixture is then used in standard
oligonucleotide synthesis. Virnekas et al. [7] exemplified the
approach by encoding eight hydrophobic amino acids in specified
positions and using the resulting, newly synthesized oligonucleo-
tides mixture as primers to amplify a 2H-10 antibody fragment.
Cloning and Sanger sequencing demonstrated that all expected
trinucleotides were incorporated into the resulting mixture of
genes, though individual codons were not present in equal propor-
tions, most likely as a consequence of differing coupling efficiencies
of the TPs [7]. Though containing some degree of bias in saturated
positions, TRIM nevertheless allowed for a subset of codons to
saturate multiple positions and so provided the first key step in
achieving truly efficient saturation mutagenesis.

1.3 MAX We developed MAX randomization [8] in order to eliminate the


Randomization degeneracy of the genetic code by using a one-to-one codon to
amino acid ratio, so eliminating representational bias in engineered
libraries, but without the need for specialized chemistry. The
method encodes each amino acid via a single codon, thereby
removing redundant and termination codons, so resulting in the
generation of highly diverse gene libraries that lack encoded amino
acid bias. The approach has the additional benefit of minimizing
DNA library size. Technically, MAX randomization employs hybri-
dization of single-stranded selection oligonucleotides to a degener-
ate template randomized by conventional (commercial) degenerate
synthesis of NNN at relevant codons. The selection oligonucleo-
tides literally “select” the required codons from the degenerate
template strand, which acts merely as a docking station during
construction of the cassette. Selection oligonucleotides are typically
nine bases long and contain six invariant bases that act as a localiza-
tion or addressing sequence to ensure accurate annealing to the
required part of the template strand. The remaining three bases are
the MAX codon, which is user-defined to encode the desired amino
22 Anupama Chembath et al.

acid in the peptide/protein library. The MAX codon can be located


at either end or in the middle of each 9-mer selection oligo. Each
randomized position is encoded by up to 20, individually synthesized
selection nucleotides (the price of 9-mer oligonucleotides is now
minimal) or a smaller number to encode user-defined selections
of amino acids as required at each position (see Notes 1 and 2).
Oligonucleotide selection pools encoding the pertinent MAX
codons are generated in equimolar volumes for each position of
selective hybridization, to generate a randomization cassette. The
technique thus allows creation of designer libraries where the user
has complete control over the groups of codons and in each position
assigned for randomization.
Figure 1a illustrates the concept, in which three positions of
interest are saturated with three different MAX oligonucleotide
pools. The MAX oligonucleotides anneal to the complementary
regions of the invariant template sequence and are flanked by con-
served regions which are used for PCR priming and amplification.
The hybridized selection oligonucleotides and flanking sequences
(End 1 and End 2) are ligated in the precise combination and
orientation, thanks to their addressed annealing, to produce a
DNA strand containing the required combination(s) of MAX
codons at each of the specified, saturated positions. After ligation,
an asymmetric PCR amplification of the strand containing MAX
codons is achieved using primers that hybridize to the conserved
flanking sequences. Here, the length of overlaps between the flank-
ing oligonucleotides (End 1 and End 2) and the conventionally
randomized template strand are critical and must be sufficiently
short to preclude amplification of the template strand itself. In
practice, we find that an overlap of 6 bp is effective (Fig. 1b), whilst
as little as 9 bp can lead to amplification of the strand containing
NNN codons, despite annealing temperature calculations which
suggest that no such amplification should occur (see Note 3).
Figure 1b illustrates an exemplar of a specific design of a MAX
randomization cassette containing four saturated positions.

1.4 ProxiMAX Although MAX randomization can saturate multiple codons, its
Randomization drawback is its inability to saturate more than two adjacent codons
at a time. This results from the essential presence of the conserved
addressing region within each selection oligo (Fig. 1b). We there-
fore developed ProxiMAX randomization, a nondegenerate muta-
genesis technique designed to saturate multiple, contiguous
codons [9, 10]. ProxiMAX randomization relies on saturation
cycling, which entails repeated cycles of blunt-ended ligation, type
IIS restriction and PCR amplification. Figure 2 demonstrates the
ProxiMAX randomization technique for saturating contiguous
codons. The technique utilizes four groups of oligonucleotides:
three donor pools and an acceptor sequence. The donor pools
possess MAX codons at their 30 ends, encoding each of the desired
Nondegenerate Saturation Mutagenesis: Library Construction and Analysis. . . 23

Fig. 1 Schematic representation of the MAX randomization process (Subheading 3.1). A single template
oligonucleotide is synthesized to be fully degenerate at the designated, saturated codons. Meanwhile, a set of
up to 20 small selection oligonucleotides are synthesized individually, for each saturated position. Each such
oligonucleotide contains a conserved addressing region, typically of six bases and one MAX codon, which
represents the preferred codon for subsequent expression of a single amino acid. (a) Schematic showing three
positions being saturated with individual oligonucleotide pools. Each selection oligonucleotide consists of a
short six base invariant region, complementary to the template and one MAX codon. The selection oligonu-
cleotides, along with two flanking oligonucleotides, anneal to the template and are ligated together. The
ligated strand is then selectively amplified with primers complementary to the terminal oligonucleotides
(P1 and P2), to generate a randomization cassette. (b) Exemplar design of a cassette containing four MAX
randomized (saturated) codons in a section of a tyrosyl tRNA synthetase gene
24 Anupama Chembath et al.

Donor sets
5’ P1 3’
3’
Set 1 MlyI MAX
MAX
5’
5’ P2 3’
P 5’
Acceptor DNA
Set 2

5’ P3 3’
MlyI MAX
MAX
3’

5’ + 3’
3’
5’
3’ Rev 5’
3’
Set 3 MlyI MAX
MAX Ligate/combine/amplify
5’

MlyI digest (purify) 5’ P1 3’


Repeat, with next donor set Set 1 MlyI
MAX 3’
MAX 5’
3’ Rev 5’

After 6 cycles…..

5’
Constant region

3’
3’
5’ + 5’ MAXMAXMAXMAXMAXMAX
3’ MAXMAXMAXMAXMAXMAX
3’
5’
Ligate completed acceptor to
required constant region

Fig. 2 Diagrammatic representation of ProxiMAX randomization (Subheading 3.5). ProxiMAX randomization


uses four different groups of oligonucleotides: three donor sets and one acceptor. Members of each donor set
share a common, generic sequence and a central, unpaired four base hairpin (typically of sequence TTTT) but
all contain a MlyI restriction site situated five bases upstream of the MAX codon. As with MAX randomization,
each member of a set contains one MAX codon, which represents the preferred codon for subsequent
expression of a single amino acid. The donor and acceptor are ligated together via blunt-end ligation,
amplified via PCR (either in pools or individually) and purified. This product is then digested with MlyI to
expose the MAX codon now at the 50 end of the acceptor sequence. This newly elongated acceptor is then
used as the acceptor for the next round. To prevent carry over from previous cycles, different donor sets are
cycled. If required, gel purification of the digested product may be performed, typically after three sets of
addition

amino acids defined by the user (see Notes 1 and 2). The donors
can be partially double-stranded oligonucleotides, fully double-
stranded DNA or self-complementary hairpin oligonucleotides,
though experience favors the latter.
Typically, individual donor oligos are pooled and blunt-end
ligated onto a double-stranded, 50 phosphorylated, acceptor
sequence. Ligation is followed by PCR amplification, digestion
with the type IIS restriction enzyme, MlyI and purification of the
resulting product. The MlyI recognition site lies upstream of the
MAX codon in the donor side of the sequence and cuts down-
stream of its recognition site, creating a blunt end. Thus MlyI
Nondegenerate Saturation Mutagenesis: Library Construction and Analysis. . . 25

digestion of the ligated product has the effect of transferring the


MAX codon mixture defined in any one cycle, from the donor
sequences to the acceptor. The process is then repeated, to transfer
additional, defined mixtures of codons onto the growing acceptor
sequence, adding one codon at a time to a growing DNA chain
[10, 11]. Alternating between different sets of donor sequences
accommodates specific primer annealing and PCR amplification so
that no crossover occurs between rounds (Fig. 2). It also allows for
defined codon addition per cycle. If required, similar, more precise
ratio control can also be achieved in the manual format, by
performing individual, parallel ligations, amplifications and quanti-
fications prior to pooling and MlyI digestion as demonstrated in the
synthesis of an anti-NGF peptide library [12].
Automation of the technique permits the addition of two
codons per cycle rather than just one, thereby enhancing the effi-
ciency of the system. ProxiMAX randomization, thus refined for an
automated platform, was commercialized as Colibra™ (Isogenica
Ltd.) for antibody library generation using automated hexamer
codon addition. In this format, ProxiMAX randomization has
been used to saturate up to 24 contiguous codons [11]. Here, we
illustrate the use of automated ProxiMAX to saturate five codons
within the putative active site of the E. coli alanyl tRNA synthetase
gene (Fig. 3).

1.5 Other Routes to Slonomics employs a fully automated, proprietary platform to syn-
Nondegenerate thesize randomized cassettes from thousands of hairpin oligonu-
Saturated DNA cleotides in order to generate highly diverse combinatorial gene
Libraries libraries [14, 15]. Unfortunately, this patented technology is no
longer widely available to the scientific community as a synthetic
service. Thus commercial preparation of nondegenerate saturated
libraries is currently limited to massively parallel oligonucleotide/
gene synthesis.

1.6 Library Size, Library size is determined by a number of factors including the
Sequence Space, and nature of the genetic code, type of mutagenic codon used in library
Screening design, and the number of designated sites for mutagenesis within
the library [16]. Removing DNA degeneracy from the randomiza-
tion scheme maximizes mutagenesis potential by optimizing the
use of sequence space. The sequence space occupied by a DNA
library is the proportion of required sequences that can actually be
created, physically. Sequence space is a fundamental consideration
when using saturation mutagenesis techniques, as the number of
required sequences for full diversity (all of the different randomiza-
tion combinations) should be below the maximum possible num-
ber of sequences that can be synthesized (and ideally, subsequently
screened) using any given approach. Although generation of mas-
sive molecular diversity is theoretically possible, the practicality of
generating such libraries and the ability to screen them will always
26 Anupama Chembath et al.

Fig. 3 Observed vs expected distribution of codons in six saturated positions within an E. coli AlaRS gene. Two
cassettes encompassing positions 41 & 43 and 212, 214, and 216 respectively were constructed using
automated ProxiMAX addition [11] of hexamer donors (positions 43, 216, and 214; positions 42, 215, and
213 were specified as conserved codons) and single codon donors (positions 41 and 212). These cassettes
were then joined to three framework sections via ligation of BsaI-digested fragments. Position 170 was
contained within one of these fragments and owing to its isolation within the gene was simply constructed
using a carefully balanced mixture of PCR primers [13]. NGS analysis of the resulting library was performed
using Isogenica’s proprietary software

remain a limiting factor. Our library construction workflow has


demonstrated that a library size of up to 1013 sequences is realistic,
in practical terms, to be constructed in a typical research laboratory
environment, whereas libraries larger than that can only be sam-
pling in nature. However, it should be noted that most of the high
throughput screening/display methodologies available can only
accommodate libraries up to the order of 109 to 1013 sequences
(depending on whether cloning or wholly synthetic approaches are
employed for expression) owing to DNA mass and transformation
efficiencies. Whether by phage display [17], yeast display [18] or
wholly synthetic methodologies such as ribosome display [19] and
cis-display [20], most library screens rely on one form or another of
biopanning, which in turn relies on the physical principle of mass
action. As such, unbiased encoding of the component proteins is
critical in order for screening to deliver optimized proteins. Non-
degenerate saturation delivers both the smallest possible libraries
and unbiased encoding. Thus, while it is challenging to determine
universal guidelines for optimal library size in a saturation muta-
genesis experiment, an over-arching principle is “smallest is best.”
Nondegenerate Saturation Mutagenesis: Library Construction and Analysis. . . 27

2 Materials

Pfu polymerase buffer (1):

Tris-HCl (pH 8.8) 0.02 M


KCl 0.01 M
(NH4)2SO4 0.01 M
MgSO4 0.002 M
Triton®X-100 0.1%
BSA 0.1 mg/ml

T4 DNA ligase buffer (1):

Tris-HCl (pH 7.5) 0.05 M


MgCl2 0.01 M
Dithiothreitol 0.01 M
ATP 0.001 M

CutSmart® buffer (1):

Potassium acetate 50 mM
Tris-acetate (pH 7.9) 20 mM
Magnesium acetate 10 mM
BSA 100 μg/ml

Reagents:

Oligonucleotides/synthetic DNA templates Multiple suppliers available


50 TAE stock Thermo Scientific
Molecular biology grade agarose Bioline
Ethidium bromide ThermoFisher
Pfu DNA polymerase Promega
T4 DNA ligase NEB
dNTP mix (10 mM) Promega
MlyI restriction enzyme NEB
50 bp DNA ladder (0.34 μg/μl) Promega
QIAquick PCR purification kit Qiagen
MinElute PCR purification kit Qiagen
Golden Gate assembly kit NEB
28 Anupama Chembath et al.

Equipment:

Nanodrop spectrophotometer ThermoFisher


Thermal cycler Eppendorf Mastercycler X50
Bio-Rad MJ Mini Personal Thermal Cycler
Centrifuge Eppendorf MiniSpin plus
Agarose gel electrophoresis Fisher Scientific Mini Plus Submarine Unit
Gel documentation system Syngene G:Box

3 Methods

3.1 MAX MAX randomization cassette(s) are made for each discreet region of
Randomization randomization as illustrated schematically in Fig. 1. The maximum
number of saturated codons within an individual cassette is limited
by the practical length of synthesis for the template strand (Fig. 1)
and also by the minimum mass of DNA required in order to contain
all theoretical template component sequences. For example, theo-
retically, a 93-mer template oligo could contain 9 positions of
randomization (sufficient to hybridize with 9 different selection
oligo pools plus a 6 base overlap at each end). However, assuming
an average MW of a nucleotide ¼ 330, the average MW of a 93-mer
oligo is 30,690. Nine codons each saturated with NNN equates to
649 or 1.8  1016 different sequences of template DNA. Thus, for a
minimum of one molecule of each possible template sequence
(assuming a perfect distribution during NNN synthesis),
~30 nmol would be required (1.8  1016/Avogadro’s number)
which equates ~920 μg of template DNA. For most applications,
this is not practical. In practice, we typically recommend limiting
each cassette to a maximum of six saturated codons. By the same
calculations, this equates to just ~110 fmol or ~2.5 ng of a 66-mer
template for one copy of each template sequence and allows us to
use quantities far in excess of those minimum values and also allows
comfortably for the multiple dilutions that are required during the
randomization process. Once constructed, multiple cassettes may
later be joined together either by overlap PCR or else by seamless
cloning methods such as Golden Gate Assembly.
Two constant oligonucleotides (End 1 and End 2), two pri-
mers (P1, the first 18 bases of End 1 and P2, the reverse comple-
ment of the last 18 bases of End 2), a template oligonucleotide
having conventional NNN saturation at the positions of MAX
randomization and corresponding sets of MAX oligonucleotide
pools (e.g., Fig. 1b) are ordered for synthesis. Since the majority
of these oligonucleotides are short, no special quality of DNA is
required, but if finances allow, it is convenient to order the MAX
Nondegenerate Saturation Mutagenesis: Library Construction and Analysis. . . 29

selection oligonucleotides and the End 2 oligonucleotide prepho-


sphorylated. Otherwise, these oligonucleotides may be phosphory-
lated manually, by using T4 polynucleotide kinase and ATP.
Depending on individual design (and typically when linking several
MAX randomization cassettes together), End 1 and End 2 oligonu-
cleotides may be long. If so, it is advisable to specify PAGE-
purification to prevent single base deletions in these nonmutated
regions. Component oligonucleotides are typically ordered
(or resuspended) at 100 μM concentration and then assembled as
follows:
1. Phosphorylated selection oligonucleotides for a given codon
are first pooled, undiluted. Pooling is repeated for each tar-
geted codon, to encode unique combinations of amino acids at
each position, as required (Fig. 1). If all selection oligos encod-
ing all 20 amino acids are combined together, this will equate
to 5 μM concentrations for each individual oligo, and a total,
combined selection pool concentration of 100 μM. Where less
than 20 selection oligos are pooled, the total pool concentra-
tion will remain constant at 100 μM, but relative concentra-
tions of component selection oligos will be proportionally
higher (e.g., 10 μM each in a pool of 10 selection oligos).
2. Two parallel hybridization reactions are set up to contain tem-
plate oligo, End 1 and End 2 oligos and each combined selec-
tion pool, to reach final concentrations of 10 μM in 1 ligase
buffer, typically in an initial volume of 19 μl (i.e., 2 μl of each
pool, and 2 μl each of the template, End 1 and End 2 oligos,
plus 2 μl 10 ligase buffer and H2O to 19 μl—volumes can be
adapted as required). Each mixture is then hybridized by heat-
ing to 99  C followed by cooling to 4  C at a rate of 1  C/
min. 1 μl of T4 DNA ligase (400 Weiss units) is then added to
the first reaction and 1 μl of H2O is added to the parallel
negative ligation control. The reactions are then incubated
overnight at 4  C. The control in which ligase is omitted, is
later used to check that the subsequent asymmetric PCR ampli-
fies the upper ligated strand exclusively (see step 5).
3. The ligation mixes are then serially diluted to 1/10, 1/100 and
1/1000. To determine the optimum conditions for amplifica-
tion (which will vary according to individual cassettes), four
PCR master mixes are made to contain 1 Pfu buffer, 50 pM
each of the forward and reverse primers P1 & P2, 0.2 mM
dNTPs, 2 units of Pfu DNA polymerase and 1 μl of template
(neat, 1/10, 1/100, and 1/1000 dilutions of the ligation mix
respectively), each in a 100 μl final volume. Each master mix is
then aliquoted into 10 μl volumes in PCR tubes and gradient
(annealing) PCR is performed following the temperature pro-
gram: initial denaturation at 98  C for 2 min followed by
30 Anupama Chembath et al.

30 cycles of denaturation for 30 s at 98  C, annealing for 30 s at


40–65  C, extension for 1 min at 72  C, and finished with a
final extension step at 72  C for 2 min.
4. The resulting PCR products are examined by electrophoresis
and optimum template dilutions and annealing temperatures
are selected on the basis of the maximum yield of a single
product of the required length.
5. Amplification of both the ligation and the negative control is
then repeated on a 100 μl scale using the optimized template
dilution and annealing temperature and the products are again
examined by agarose gel electrophoresis. There should be no
band in the negative control, whilst the amplification under
optimized conditions should yield a single, intense band of the
required length.

3.2 Joining Multiple Where multiple cassettes are required and joining is to be achieved
MAX Randomization by overlap PCR, End 1 and End 2 oligos are designed to have an
Cassettes by Overlap 18 base overlap with the corresponding end oligos of neighboring
PCR cassettes.
1. In the first stage of MAX library assembly, equal high volumes
of individual cassettes (e.g., 30 μl each of 3 neighboring cas-
settes) are combined in a 100 μl reaction containing 0.2 mM
dNTPs and 1 unit of Pfu DNA polymerase and are amplified
without any additional primers, under the following cycling
conditions: initial denaturation at 98  C for 2 min followed
by 20 cycles of denaturation for 30 s at 98  C, annealing for
30 s and extension for 2 min at 72  C, followed by a final
extension step at 72  C for 5 min. The annealing temperature
is selected as a compromise between the optimal annealing
temperatures of individual cassettes.
2. The process of optimizing final product yield by serially dilut-
ing the resulting PCR product and optimizing the annealing
temperature is performed as described in Subheading 3.1, step
3, but instead using primers that flank the entire, combined
product rather than primers P1 and P2.
3. The optimized whole-product PCR is then scaled up as
required and the resulting final product is purified using a
PCR purification kit according to manufacturer’s instructions.
DNA concentration of the purified library is determined by
measuring the absorbance at 260 nm.
4. Finally, the combined product is analyzed by NGS.

3.3 Joining Multiple If Golden Gate Assembly is to be used to join MAX randomization
MAX Randomization cassettes, the 50 ends of the P1 and P2 oligonucleotides are
Cassettes by Golden extended to contain appropriate Type IIS restriction sites and a
Gate Assembly 4 base overlap between fragments. Golden Gate Assembly is then
performed using a kit according to manufacturer’s instructions.
Nondegenerate Saturation Mutagenesis: Library Construction and Analysis. . . 31

3.4 Denaturation When saturating highly repetitive sequences such as those encoding
Gradient PCR: A Useful α-helical repeat proteins, concatemers can occasionally result dur-
Tip for MAX ing the amplification stage that cannot be resolved subsequently by
Randomization of optimizing the dilution of the template and the annealing temper-
Repetitive Sequences ature of the PCR. However, in these unusual cases, we have discov-
ered that such concatemers may be resolved successfully by running
a denaturation gradient PCR.
Figure 4 illustrates a 90 bp, highly repetitive gene fragment that
contained 5 saturated codons and was generated by MAX random-
ization. During PCR optimization, neither altered annealing tem-
peratures nor dilution of the template DNA gave rise to an
improvement in the yield of the required 90 bp PCR product. In
fact, multiple bands of similar intensity to the desired product could
be seen at several annealing temperatures and for all template dilu-
tions and annealing gradients tested.
Having failed to generate the required single product via tradi-
tional optimization, we then hypothesized that akin to the melt
analysis employed in some qPCR experiments [21], a reduced
denaturation temperature might prevent the denaturation (and
consequently, the amplification) of any longer, concatemeric pro-
ducts. In essence, by employing a temperature gradient at the
denaturation stage of cycling, a temperature capable of denaturing
the target sequence, but not the longer undesirable sequences,
might be identified (see Note 4).
The following methods were employed to test this hypothesis:
1. A PCR master mix was made to contain 1 Pfu DNA polymer-
ase buffer, 0.2 mM dNTPs, 50 pmol each of forward and
reverse primers, 3 units of Pfu DNA polymerase, and 1 μl
ligated template.
2. Having previously optimized an annealing temperature of
53.4  C, a denaturation gradient from 81  C to 87  C was
applied for PCR optimization.
3. The following cycling conditions were used: initial denatur-
ation for 2 min at 84  C, followed by 30 cycles of 30 s denatur-
ation gradient from 81–87  C, annealing for 30 s at 53.4  C,
extension for 10 s at 72  C and a final extension step with
1 cycle for 60 s at 72  C followed by hold at 4  C. These
reagent conditions and cycling times were within manufac-
turer’s recommended specifications, with the brief extension
time of 10 s selected to complement the manufacturer’s rec-
ommendation of 2 min/1 kb of the product.
4. The resulting PCR products were electrophoresed in 3% TAE
agarose gels, stained with ethidium bromide and imaged on a
Syngene G:Box.
5. As illustrated in Fig. 4, (lane 3), subsequent amplification of
this product under the exact PCR conditions described above
32 Anupama Chembath et al.

Fig. 4 The effect of denaturation temperature on PCR amplicon production. A


denaturation gradient PCR was performed as described in Subheading 3.4, using
a predetermined, constant annealing temperature of 53.4  C and denaturation
temperatures ranging from 81  C to 87  C. Lanes: M ¼ 50 bp MW ladder; 1–8,
PCRs with denaturation temperatures of: 1, 81.0  C; 2, 81.5  C; 3, 82.3  C;
4, 84.8  C; 5, 85.9  C; 6, 86.6  C; 7, 87.0  C; 8, 87.0  C (duplicate)

with a denaturing temperature of 82.3  C and an annealing


temperature of 53.4  C did indeed give a good yield of the
required, homogeneous product at 90 bp.

3.5 ProxiMAX ProxiMAX randomization can be described as an adaptation of


Randomization MAX randomization for contiguous codons. Each saturated posi-
tion within the library can be user-defined, and could be an equi-
molar mixture to encode all 20 amino acids or a custom-made
sub-set where less than 20 amino acids are required. However,
the process and its results are quite different, not least because in
addition to equimolar encoding (typically for enzyme engineering),
ProxiMAX randomization can also encode required amino acids in
user-specified ratios (typically for antibody engineering). Impor-
tantly, unlike MAX randomization, the codon donors and their
primers, which comprise the majority of oligonucleotides (Fig. 2)
function independently of the DNA sequence being constructed
and so stocks can be used repeatedly in different experiments.
However, ProxiMAX randomization requires higher-grade syn-
thetic DNA of relatively long sequence and is therefore much
more expensive to set up than MAX randomization. Typically,
three sets (sets 1–3) of 20  66mer donor hairpin oligonucleotides
will be purchased along with three corresponding 18-mer primers
(Fig. 5a), with MAX codons chosen according to favored codons in
the expression organism. Phosphorylation of the donor hairpins is
not required. On receipt, hairpins are individually self-hybridized
by incubating at 99  C for 5 min followed by a cooling rate of 1  C/
min, to 4  C. After incubation overnight at 4  C, they are stored in
aliquots at 20  C until required.
In manual ProxiMAX randomization for equimolar addition of
codons, the gene fragments are assembled via saturation cycling,
one codon at a time from the 30 end, as follows:
Fig. 5 Comparison of NGS data analysis methods in Excel (Subheadings 3.7, 3.8, and 3.9). Data represents
analysis of seven saturated positions within a gene encoding an α-helical repeat protein. Two cassettes
containing three and four saturated codons respectively were generated by MAX randomization and then
joined by overlap PCR. Data relate to approximately 800,000 sequences of 290–336 bp, preprocessed as
described in Subheading 3.6. (a) Data analysis by delimiting (Subheading 3.7). (b) Data analysis using the
“COUNTIF” function. (c) Data analysis using find and replace, “LEFT,” and “MID” functions
34 Anupama Chembath et al.

1. For the first codon (Fig. 2), individual hairpin donors from set
1 are combined at a final concentration of 1 μM (¼ 1 pmol/μl)
total donor DNA (precise concentrations of individual compo-
nents will vary according to the numbers of donors chosen).
2. Ten pmol of total donor DNA is then ligated to 3.3 pmol
acceptor sequence in a 20 μl reaction volume containing
1 ligase buffer, containing 1 mM ATP and 1 μl (400 Weiss
units) T4 DNA ligase, incubating at room temperature for 2 h
followed by 37  C incubation for 30 min. The acceptor
sequence can be any blunt-ended, double-stranded, linear
piece of DNA (synthetic, PCR product or other) that bears a
50 phosphate group at the point of ligation (Fig. 2).
3. The ligation mix is then diluted 1000-fold and amplified using
the appropriate forward primer according to donor set and a
universal reverse primer for the acceptor sequence, in duplicate
100 μl PCR reactions containing 1 Pfu buffer, 0.1 mM
dNTPs, 50 pmol forward and reverse primers, 1 unit of Pfu
DNA polymerase, and 1 μl of diluted template.
4. Following examination by gel electrophoresis, the PCR reac-
tions are combined and purified using a MinElute PCR Purifi-
cation Kit following manufacturer’s instructions.
5. The purified DNA is recovered in 30 μl of sterile dH2O and
25 μl of this purified product is digested in a 50 μl reaction
volume containing 1 CutSmart® buffer and 10 units of MlyI
for 37  C for 1 h followed by heat inactivation at 65  C for
20 min.
6. A sample of the resulting product is examined by electrophore-
sis in a 3% agarose gel to confirm digestion and the concentra-
tion of the remainder is determined by measuring A260 in a
NanoDrop spectrophotometer. The volume of the digested
DNA corresponding to 3.3 pmol is then calculated. Gel purifi-
cation of the digested product may be performed at this stage,
if required.
7. Steps 1–6 are repeated with the next set of oligonucleotide
donors (i.e., cycling donor sets 1–3) each time using the MlyI--
digested product from the previous round as the new acceptor,
until all required codons have been added.
8. Once sufficient codons have been added, a final, constant
region is added to the last codon by blunt end ligation and
the resulting product is once-more amplified by PCR as
described in step 5.
9. Where multiple ProxiMAX fragments are joined together, this
is typically achieved by incorporating Type IIS restriction sites
(such as BsaI) into both the acceptor sequence and the final
constant region ligated in step 8. The resulting BsaI fragments
Nondegenerate Saturation Mutagenesis: Library Construction and Analysis. . . 35

are then gel-purified and ligated together. Owing to the differ-


ent, sticky-ended sequences generated by BsaI, ligation of frag-
ments in the correct order is achieved.
10. The resulting ProxiMAX library is examined by next genera-
tion sequencing (NGS).
11. Alternative: if required, more precise control of codon ratios
can be achieved in manual ProxiMAX addition by omitting
step 1 and performing steps 2–4 in parallel, individual reac-
tions (i.e., one donor per reaction). Here concentrations of
donors and acceptor oligonucleotides are adjusted to 2.4 μM
and 0.4 μM respectively [12]. The quantified, amplified pro-
ducts are then pooled at the required ratios before proceeding
to MlyI digestion (step 5).

3.6 NGS Data NGS is undertaken routinely to assess the quality and diversity of
Analysis of Saturated the generated mutant libraries. Even though many software
Libraries: Part 1, resources have been released over the years for evaluation of NGS
Creating an Excel Data data, none of the commercially available resources are well-suited to
File assess the distribution of codons at anticipated saturated positions
across a saturation mutagenesis library. Consequently, we have
developed the following data analysis strategies that involves no
specialized equipment or programs in order to assess MAX ran-
domization libraries. Here, we exemplify the strategies using a
mutant library generated using MAX randomization to saturate
seven codons in a α-helical repeat protein:
1. NGS was accomplished via a commercial Illumina MiSeq ser-
vice, following 2  250 bp paired-end sequencing.
2. Once the NGS data is received, the two paired end reads (fastq
format) are uploaded onto Galaxy bioinformatics software
(https://usegalaxy.org/) along with the reference library
sequence (fasta format) for alignment.
3. Since the two reads, when combined, will form the full-length
randomized library, the first step in data analysis is to join the
two paired end reads together, and this is achieved using the
“fastq-join” function (using default settings) which joins two
paired-end reads on overlapping ends.
4. The joined sequences are then subjected to quality control
using filter fastq function where the reads are filtered by quality
score and length. Since this cassette was 336 bp in length, all
sequences with a minimum length of 290 bp and a maximum
length of 336 bp was included in the analysis to set the
sequence length (290 bp length was chosen so that all of the
seven randomized regions in the library would be included in
the filtered sequences. All the other settings were left to
default).
Another random document with
no related content on Scribd:
prohibition against accepting unsolicited donations from donors in
such states who approach us with offers to donate.

International donations are gratefully accepted, but we cannot make


any statements concerning tax treatment of donations received from
outside the United States. U.S. laws alone swamp our small staff.

Please check the Project Gutenberg web pages for current donation
methods and addresses. Donations are accepted in a number of
other ways including checks, online payments and credit card
donations. To donate, please visit: www.gutenberg.org/donate.

Section 5. General Information About Project


Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could be
freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose network of
volunteer support.

Project Gutenberg™ eBooks are often created from several printed


editions, all of which are confirmed as not protected by copyright in
the U.S. unless a copyright notice is included. Thus, we do not
necessarily keep eBooks in compliance with any particular paper
edition.

Most people start at our website which has the main PG search
facility: www.gutenberg.org.

This website includes information about Project Gutenberg™,


including how to make donations to the Project Gutenberg Literary
Archive Foundation, how to help produce our new eBooks, and how
to subscribe to our email newsletter to hear about new eBooks.

You might also like