Professional Documents
Culture Documents
Gkaa 389
Gkaa 389
Gkaa 389
1
Computational Biology and Clinical Informatics, Baker Institute, Melbourne, VIC 3004, Australia, 2 Structural Biology
and Bioinformatics, Department of Biochemistry and Molecular Biology, Bio21 Institute, University of Melbourne,
Received March 10, 2020; Revised April 18, 2020; Editorial Decision May 04, 2020; Accepted May 16, 2020
* To
whom correspondence should be addressed. Tel: +61 90354794; Email: david.ascher@unimelb.edu.au
Correspondence may also be addressed to Douglas E. V. Pires. Email: douglas.pires@unimelb.edu.au
C The Author(s) 2020. Published by Oxford University Press on behalf of Nucleic Acids Research.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which
permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
W126 Nucleic Acids Research, 2020, Vol. 48, Web Server issue
using graph-based signatures, sequence- and structure- our dataset, we identified a set of 38 multiple point muta-
based information. mmCSM-AB models were trained using tions, where each individual mutation had been experimen-
single-point mutations and the effects of multiple mutations tally characterized as a single-point missense mutation. Ad-
were assessed, outperforming other available tools across ditive mutations were defined as when the sum of the in-
our validation set of experimentally measured changes with dividual mutations was within 1 kcal/mol of the multiple-
double to 14 mutations. mmCSM-AB will help to guide ra- point mutation. We identified 24 additive and 14 synergistic
tional antibody engineering by analysing the effects of in- mutations.
troducing multiple mutations on binding affinity.
Non-binder dataset. During data curation we identified 47
Figure 1. mmCSM-AB workflow. Method development can be divided into four steps. During data preparation, 905 single point mutation and 242 multi-
point mutations across 62 complexes (50 Fab, 3 nanobody, 1 monobody,1 Ab-Ab and 7 others described in Supplementary Table S1) were collected.
During feature extraction, the curated 3D structures were used to calculate graph-based signatures as well as complementary structure- and sequence-based
attributes. On the next step, these were provided as evidence to train and test supervised learning algorithms. Greedy feature selection was performed to
optimize performance on multiple mutations and reduce dimensionality. On the last step, the best performing model was implemented into three prediction
modes; the ‘Prediction mode’ for running user-specified mutations and the Antibody and Antigen Design modes, for systematically assessing permutation
of multiple mutations on the antibody and antigen, respectively, at the binding interface.
W128 Nucleic Acids Research, 2020, Vol. 48, Web Server issue
**P-value < 0.01, ***P-value < 0.001; statistical significance of Pearson’s correlation coefficient was evaluated by Fisher’s r-to-z transformation (two-tailed) and Spearman’s rank-correlation coefficient was
converted as described by Walker et al. (2003) into Pearson’s correlation before we applied the transformation.
mmCSM-AB was trained using different stratified cross- other methods achieving an Area Under the ROC Curve
validation approaches. We achieved Pearson’s correlations (AUC) of 0.94.
up to 0.70 (RMSE = 1.03 kcal/mol) on 5-, 10- and 20-fold To evaluate whether the performance relies on the train-
cross validation. We further validated the model, by per- ing dataset, we filtered out 53 out of 101 double and triple
forming a low redundancy leave-one-complex out valida- mutations and 104 out of the overall 242 multiple mutations
tion, achieving a Pearson’s correlation up to 0.64 (RMSE = where none of the mutations was present in the training
1.10 kcal/mol). Following feature selection using a greedy dataset. mmCSM-AB achieved a Pearson’s correlation of
algorithm, we were left with a total of 83 features. Interest- 0.92 (RMSE = 0.66 kcal/mol, Table 1) and 0.77 (RMSE
ingly, the only features selected for the final model were the = 1.15 kcal/mol, Table 1) on the 53 unique double and
graph-based structural signatures, the evolutionary score triple mutations and the 104 unique mutations respectively,
and the Arpeggio calculated interactions. which demonstrates the robustness of our approach and its
applicability to understand mutational effects upon diverse
multi-point mutations.
Performance on multi-point mutations
Our final mmCSM-AB model achieved a Pearson’s cor- Performance on additive and synergistic multiple point muta-
relation of 0.95 (RMSE = 0.91 kcal/mol, Figure 2) and tions
0.80 (RMSE = 1.30 kcal/mol, Figure 2) on the 101 dou-
ble and triple mutations and the overall 242 multiple mu- From the 38 multiple mutations which have their individual
tations, respectively (Table 1). Comparing the performance Gs from single-point mutation dataset, we identified 24
of mmCSM-AB to 18 alternative approaches, mmCSM sig- additive and 14 synergistic sets of multiple-point mutations.
nificantly outperformed all methods across both mutation The mmCSM-AB model was evaluated across both sets,
sets (P-value <0.01, Fisher’s r-to-z transformation; Table showing comparable performance regardless of whether the
1). The inclusion of the hypothetical reverse mutations led mutation effects were additive or synergistic. mmCSM-AB
to a significant improvement in performance. In particular, achieved a Pearson’s correlation of 0.97 (RMSE = 0.84
inclusion of hypothetical reverse mutations resulted in a fi- kcal/mol) across the multiple point mutations identified as
nal model with significantly improved ability to classify mu- additive, and 0.94 (RMSE = 1.62 kcal/mol) across those
tations as leading to increased binding affinity (Supplemen- identified as synergistic.
tary Table S2).
For comprehensive reviews of the performance in classi-
Performance in ranking the effect of multiple point mutations
fying favourable and unfavourable mutations across avail-
able methods, the predicted values from the comparative To guide rational antibody engineering, effective tools need
study (Table 1) were further classified as either increas- to be able to identify the mutations leading to the great-
ing (G > 0.5 Kcal/mol) or decreasing (G ≤ –0.5 est improvement in binding affinity. We therefore further
kcal/mol) binding affinity. Using mmCSM-AB to clas- assessed the ability of mmCSM-AB and available tools
sify mutations into these two categories, we achieved a to ranking mutations in the order of most increasing and
Mathew’s Correlation Coefficient (MCC) of 0.67 and F1- decreasing binding affinity. mmCSM-AB showed strong
score of 0.89. Supplementary Figure S6 shows the corre- performance, achieving the Kendall’s Tau and Spearman’s
sponding ROC curves, with our predictor outperforming rank-correlation coefficient up to 0.71 and 0.86 on the
W130 Nucleic Acids Research, 2020, Vol. 48, Web Server issue
242 multiple mutations, and outperforming all other ap- 3. Sulea,T., Hussack,G., Ryan,S., Tanha,J. and Purisima,E.O. (2018)
proaches (Supplementary Table S3). Application of Assisted Design of Antibody and Protein Therapeutics
(ADAPT) improves efficacy of a Clostridium difficile toxin A
single-domain antibody. Sci. Rep., 8, 2260.
Performance on non-binders 4. Almagro,J.C., Teplyakov,A., Luo,J., Sweet,R.W., Kodangattil,S.,
Hernandez-Guzman,F. and Gilliland,G.L. (2014) Second antibody
The mmCSM-AB model was further evaluated against the modeling assessment (AMA-II). Proteins, 82, 1553–1562.
47 data points where the introduction of multiple mutations 5. Sircar,A. and Gray,J.J. (2010) SnugDock: paratope structural
led to complete disruption of antigen binding. mmCSM-AB optimization during antibody-antigen docking compensates for errors
in antibody homology models. PLoS Comput. Biol., 6, e1000644.
correctly classified 46 out of 47 non-binders (Supplemen- 6. Brenke,R., Hall,D.R., Chuang,G.Y., Comeau,S.R., Bohnuud,T.,
25. Myung,Y., Rodrigues,C.H.M., Ascher,D.B. and Pires,D.E.V. (2020) 41. Karmakar,M., Rodrigues,C.H.M., Horan,K., Denholm,J.T. and
mCSM-AB2: guiding rational antibody design using graph-based Ascher,D.B. (2020) Structure guided prediction of Pyrazinamide
signatures. Bioinformatics, 36, 1453–1459. resistance mutations in pncA. Sci. Rep., 10, 1875.
26. Jafri,M., Wake,N.C., Ascher,D.B., Pires,D.E., Gentle,D., 42. Vedithi,S.C., Rodrigues,C.H.M., Portelli,S., Skwark,M.J., Das,M.,
Morris,M.R., Rattenberry,E., Simpson,M.A., Trembath,R.C., Ascher,D.B., Blundell,T.L. and Malhotra,S. (2020) Computational
Weber,A. et al. (2015) Germline mutations in the CDKN2B tumor saturation mutagenesis to predict structural consequences of
suppressor gene predispose to renal cell carcinoma. Cancer Discov., 5, systematic mutations in the beta subunit of RNA polymerase in
723–729. Mycobacterium leprae. Comput. Struct. Biotechnol. J., 18, 271–286.
27. Usher,J.L., Ascher,D.B., Pires,D.E., Milan,A.M., Blundell,T.L. and 43. Pires,D.E., Blundell,T.L. and Ascher,D.B. (2015) pkCSM: predicting
Ranganath,L.R. (2015) Analysis of HGD gene mutations in patients small-molecule pharmacokinetic and toxicity properties using
with alkaptonuria from the United Kingdom: Identification of novel graph-based signatures. J. Med. Chem., 58, 4066–4072.