Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

International Journal of Mass Spectrometry 426 (2018) 29–37

Contents lists available at ScienceDirect

International Journal of Mass Spectrometry


journal homepage: www.elsevier.com/locate/ijms

Full Length Article

Deconvolution of ion mobility mass spectrometry arrival time


distributions using a genetic algorithm approach: Application to
␣1 -antitrypsin peptide binding
Ganesh N. Sivalingam a , Adam Cryar a , Mark A. Williams b , Bibek Gooptu c ,
Konstantinos Thalassinos b,a,∗
a
Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London, WC1E 6BT, United Kingdom
b
Institute of Structural and Molecular Biology, Department of Biological Sciences, Birkbeck, University of London, London, WC1E 7HX, United Kingdom
c
Leicester Institute of Structural and Cellular Biology and NIHR BRC-Respiratory, Henry Wellcome Building, Lancaster Road, Leicester, LE1 7RH, United
Kingdom

a r t i c l e i n f o a b s t r a c t

Article history: Ion mobility mass spectrometry (IM-MS) is a fast and sample-efficient method for analysing the gas phase
Received 2 August 2017 conformation of proteins and protein complexes. Subjecting proteins to increased collision energies prior
Received in revised form to ion mobility separation can directly probe their unfolding behaviour. Recent work in the field has
19 December 2017
utilised this approach to evaluate the effect of small ligand binding upon protein stability, and to screen
Accepted 5 January 2018
compounds for drug discovery. Its general applicability for high-throughput screening will, however,
depend upon new analytical methods to make the approach scalable. Here we describe a fully automated
Keywords:
program, called Benthesikyme, for summarising the ion mobility results from such experiments. The
Ion mobility mass spectrometry
Travelling wave ion mobility
program automatically creates collision induced unfolding (CIU) fingerprints and summary plots that
Collision induced dissociation capture the increase in collision cross section and the increase in conformational flexibility of proteins
Software’ Data processing during unfolding. We also describe a program, based on a genetic algorithm, for the deconvolution of
Genetic algorithm arrival time distributions from the CIU data. This multicomponent analysis method was developed to
require as little user input as possible. Aside from the IM-MS data, the only input required is an estimate
of the number of conformational families to be fitted to the data. In cases where the appropriate number
of conformational families is unclear, the automated procedure means it is straightforward to repeat the
analysis for several values and optimize the quality of the fit. We have employed our new methodology
to study the effects of peptide binding to ␣1 -antitrypsin, an abundant human plasma protein whose
misfolding exemplifies a group of conformational diseases termed the serpinopathies. Our analysis shows
that interaction with the peptide stabilises the protein and reduces its conformational flexibility. The
previously unresolved patterns of unfolding detected by the deconvolution algorithm will allow us to set
up a fully automated screen for new ligand molecules with similar properties.
© 2018 Published by Elsevier B.V.

1. Introduction pared to more established tools used in the field, namely the ability
to study proteins of high flexibility [3–5] and polydispersity [6],
1.1. Ion mobility and native conformation the ability to monitor and separate distinct conformational families
co-existing in solution [7,8] and all this whilst having modest sam-
Ion mobility mass spectrometry (IM-MS) is increasingly used ple requirements. IM-MS separates ions, typically generated from
in the field of structural biology to address a number of challeng- nano-electrospray ionization (nESI), based on their mass, charge
ing questions relating to the structure and dynamics of proteins and collision cross section (CCS). A large number of studies have
and protein complexes [1,2]. IM-MS offers several advantages com- shown that under carefully controlled experimental conditions and
at low collision energies, the structures from IM-MS are closely
related to those in solution for globular proteins and protein com-
∗ Corresponding author at: Institute of Structural and Molecular Biology, Division
plexes [9–11].
of Biosciences, University College London, Gower Street, London, WC1E 6BT, United
Kingdom.
E-mail address: k.thalassinos@ucl.ac.uk (K. Thalassinos).

https://doi.org/10.1016/j.ijms.2018.01.008
1387-3806/© 2018 Published by Elsevier B.V.
30 G.N. Sivalingam et al. / International Journal of Mass Spectrometry 426 (2018) 29–37

1.2. Gas phase unfolding of proteins [27] has also led to an expansion in the number of laborato-
ries that have access to this technology. As a consequence of an
By intentionally increasing the internal energy of an ion prior increased IM-MS user base, and the development of new types of
to IM-MS analysis, however, a number of interesting observations data acquisition methods such as CIU, new computational methods
can be carried out. In some of the early applications of IM-MS to to process these data are required. Programs to process the IM-MS
study protein structure, the groups of Jarrold, Clemmer and Bowers data [28,29], and CIU data [30,31] have been recently developed. It
used injection energy studies to probe the unfolding of a number is particularly desirable to report data from IM-MS studies in a stan-
of model and disease-related proteins [12–14]. These experiments dardised, quantitative manner to allow them to be compared. This
revealed the relative stability of different proteins. Proteins held requires the definition of parameters that reduce the dimension-
together with disulphide bonds were more resistant to unfolding ality of IM-MS data whilst retaining important information about
compared to those without them. Injection energy studies have also the ensemble behaviours that IM-MS is particularly well suited to
been used to probe the oligomeric state for a number of proteins characterise.
involved in protein aggregation diseases. Oligomers dissociate to Here we describe a new computational approach for the analysis
lower aggregation states and eventually to monomer at increasing of CIU experiments. We present the benefit of summarising the data
injection energies [15]. using the intensity weighted mean (IWMATD ) and standard devia-
tion (IWSDATD ) of each arrival time distribution (ATD). The results
1.3. Unfolding of protein complexes are a quantitative method for representing the degree of unfolding
in terms of change in CCS. We also describe a new genetic-based
Collision induced dissociation (CID) followed by IM-MS analysis algorithm for the deconvolution and subsequent multicomponent
has also been used to study large protein complexes [10]. Unlike analysis of CIU data, capable of dealing with the conformational
CID of peptides, which results in the fragmentation of the peptide heterogeneity of unfolding proteins. Deconvolution allows accu-
bond, for non-covalent complexes the increase in energy results in rate determination of the centre of conformational populations.
unfolding, typically of the smallest surface accessible subunit and It also facilitates more accurate calculation of the abundance of
so the term collision induced unfolding (CIU) is used in this case. each resolvable conformational family in comparison to existing
This unfolding is followed by an asymmetric dissociation of the methodology, which uses peak heights. To demonstrate the effi-
complex with many charges from the precursor being transferred cacy of these methods, three proteins that have been previously
to the smallest subunit, which is now able to accommodate more well-characterised by native MS methods have been analysed;
charges as it has a larger surface area than prior to its dissociation myoglobin, lysozyme and ␤-lactoglobulin, as well as the protein ␣1 -
from the complex [16]. More recent work, however, has shown that antitrypsin in both its apo-form and bound to a peptide previously
the above process depends on the tertiary structure of the protein shown to inhibit its aggregation.
complex [17] and on the charge state selected for CIU [18]. Since Benthesikyme, our software for summarizing the data is
its initial description, CIU has been used to study a number of dif- available from our website (http://www.homepages.ucl.ac.uk/
ferent processes, including gas phase protein stability due to anion ∼ucbtkth/resources.html) while the genetic algorithm program is
and cation binding [19,20], peptide binding [21], and lipid binding available on request.
selectivity [22].
2. Materials and methods
1.4. Protein ligand binding and drug screening
2.1. Mass spectrometry sample preparation
Another major application area of the CIU-IM-MS approach is
examining the effect of ligand binding on the protein or protein Proteins were analysed in 200 mM ammonium acetate (Sigma
complex structure. Comparison of the arrival time distributions Aldrich, St. Louis, MO). Buffer exchange was carried out using
obtained at increased collision energies between apo- and ligand- BioRad (Hercules, CA) BioSpin 6 columns, with additional concen-
bound forms, for a number of proteins [23] and protein complexes, tration and dilution steps using Amicon Ultra 0.5 ml centrifugal
enabled characterisation of the changes in structural stability filter devices (Millipore UK Ltd, Watford, UK). The concentration
induced by ligand binding. The influence of different ligand-bound of these samples was monitored using a Qubit 2.0 fluorometer (Life
states has also been examined; illustrating the power of this Technologies, Carlsba␣d, CA). Samples were analysed after diluting
approach in studying heterogeneous systems having multiple lig- to a final concentration of 10 ␮M.
and binding stoichiometries [24,25].
Ligand-specific CIU signatures may report other consequences 2.2. Peptide binding to ˛1 -antitrypsin
of ligand binding. This has been demonstrated for small molecule
inhibitors of the protein kinase domain of the Abl protein. These are ␣1 -antitrypsin was buffer exchanged into 250 mM ammonium
defined by their binding specificity as type I (non-specific for the acetate (pH 7) by 3 rounds of dilution-concentration using 10 kDa
active conformer of the Abl kinase domain) or type II (active state Millipore Amicron Ultra centrifuge filters. The Ac-TTAI-NH2 pep-
specific) inhibitors. Although binding of either type of ligand to the tide was also made up in ammonium acetate (pH 7). A molar ratio
protein did not substantially alter the conformation at low collision of 2:1 peptide:protein was used and the sample incubated for 72 h
energies, they displayed distinct unfolding signatures. These ‘ref- at 37.5 ◦ C before analysis. The final concentration of ␣1 -antitrypsin
erence’ type I and type II signatures were then compared to the CIU was 15 ␮M.
signatures of novel ligands to predict their inhibitory mechanism
[26]. 2.3. IM-MS analyses

1.5. The need for computational tools Experiments were performed on a Synapt HDMS mass spec-
trometer (Waters Corp., Manchester, UK) [27]. The Synapt
Consequently, the unfolding of proteins and protein complexes geometry has two collision cells bracketing the ion mobility device:
in the gas phase followed by IM-MS analysis provides valuable the trap, which is situated before the IM device, and the transfer
information that cannot be easily obtained by other structural which is located after the IM device. To carry out CIU experiments,
methods. The development of commercial IM-MS instrumentation it is the voltage in the trap cell that is increased. The instrument
G.N. Sivalingam et al. / International Journal of Mass Spectrometry 426 (2018) 29–37 31

was mass calibrated using 30 ␮M caesium iodide (Sigma Aldrich, Simulated data are generated as a sum of Gaussian peaks as
St. Louis, MO) dissolved in 250 ␮M ammonium acetate. 2.5–4.5 ␮l defined in Eq. (1), where d is the number of data points per ATD, n
aliquots of sample were delivered to the mass spectrometer using the number of Gaussian distributions (conformations) being fit, S
gold coated borosilicate capillaries prepared in-house [32]. is the vector of simulated data points and ␮,  and A are the mean
The source temperature was 40 ◦ C, the capillary voltage was set position, full width half maximum (FWHM) and amplitude of the
to 1.1 kV, and the cone voltage was kept at 30 V throughout all the Gaussian distribution, respectively.
experiments. The gas in the T-Wave IM separation cell was nitro- 2
( )
xi,j −i
gen, and the pressure was 0.54 mbar. The bias voltage was kept at 
d

n − √ 2
20 V while the T-Wave velocity was set to 350 m/s. The T-Wave S= Ai e (
2 i /2 2 ln 2) (1)
height was optimised at 9 and 10 V for the study of the model pro- j=1 i=1
teins and ␣1 -antitrypsin, respectively. Unfolding of proteins was
performed by varying the trap voltage. The TOF analyzer was oper-
3. Results
ated in V-optic mode and tuned for an operating resolution of 9000
FWHM. Mass spectra were acquired at an acquisition rate of 2 spec-
3.1. Summarising unfolding data
tra/sec with an interscan delay of 100 ms.
CCS calibration curves were calculated using proteins with
The large amount of data produced by collision induced unfold-
known collision cross sections [33] acquired using the same ion
ing experiments can be difficult to represent graphically. In many
mobility parameters as the sample under study. The data were
cases the ATDs are displayed by vertically stacking them. Such a
extracted and calibrations applied using the Amphitrite software
figure illustrates the key structural features of each distribution
package [29,34].
clearly. When the number of energy conditions increases how-
For the ␣1 -antitrypsin unfolding experiments the quadrupole
ever, quantitative evaluation of multiple overlapping and emerging
was used to isolate the +13 charge state. The CCS calibration curves
features can become difficult to discriminate between ATDs (for
for these experiments were determined using ␤-lactoglobulin
example see Supplementary Fig. 1). Representing the relevant data
monomer and dimer charge states as well as concanavalin A
directly as a user-friendly readout is important to allow the use
tetramer.
of this strategy in a high-throughput setting [35] for rapid iden-
Experiments analysing model proteins also used the quadrupole
tification of promising ‘hits’ without detailed, time-consuming,
to isolate certain charge states (+7 lysozyme, +8 myoglobin and +8
processing of the resulting raw data.
␤-lactoglobulin). All data reported for myoglobin are only for the
To illustrate the issues involved in summarising ATD data,
heme-bound form. CCS calibration curves were calculated using
Fig. 1A shows three hypothetical ATDs generated using Eq. (1). Even
BSA and bradykinin as the calibrants.
though the maximum intensity is found at an arrival time of 10 ms
in all three ATDs, the distributions are quite different. Simply select-
2.4. Software development ing the position of the peak maximum, would not differentiate
between them. ATD (a) consists of a single narrow conformational
The genetic algorithm was developed in the C programming lan- family, (b) has an additional conformation at a higher arrival time
guage before transferring to Cuda C. The software has been tested (shown as a shoulder on the main peak), while (c) represents an
on GNU/Linux and Windows. Successfully tested graphics cards ATD that is much broader than that shown in (a).
include Nvidia Tesla C1060, GeForce 560 Ti and 660 Ti. In order for a single metric to compactly summarise an ATD its
Processing and displaying the results generated by the genetic calculation must capture changes in the average arrival time, in
algorithm is handled by the Python programming language, in peak width, and the presence of distinct minor conformations. The
conjunction with Python modules Numpy, Scipy, Matplotlib and intensity weighted mean of arrival times (IWMATD ) is sensitive to
Amphitrite [29]. changes in these features. IWMATD is calculated as in Eq. (2), where

Fig. 1. (A) Synthetic data demonstrating spectral differences that can be identified by the proposed summary statistics – Arrival time (td ) of the spectral maximum is shown
as blue line and the intensity weighted average (IWMATD ) is marked in red and the intensity weighted standard deviation (IWSDATD ) given in the top right of each panel; (B)
CIU fingerprints for lysozyme (left), ␤-lactoglobulin (middle) and myoglobin (right); Unfolding curves for model proteins presented as (C) intensity weighted mean collision
cross-section (IWMCCS ), (D) change in IWMCCS , and (E) IWSDCCS (indicative of conformational variability). (For interpretation of the references to color in this figure legend,
the reader is referred to the web version of this article.)
32 G.N. Sivalingam et al. / International Journal of Mass Spectrometry 426 (2018) 29–37

I is the intensity in each time interval, t is the arrival time and n is for lysozyme the variability of the CCS increases almost uniformly,
the number of data points in the arrival time axis. indicating that the protein is still experiencing relatively early, par-
n tial unfolding and has not begun converging on a single ensemble
Ii ti
IWMATD = i=1
n (2) of structures.
I
i=1 i Summarising the data as described here allows for a repro-
ducible and objective means of describing ATDs with no
The intensity weighted standard deviation (IWSDATD ), calcu-
requirement for peak fitting to the data. These metrics are fast to
lated using Eq. (3), is a complementary metric that reports the
perform, allow for the quantification of differences in stability and
degree of conformational variation represented by the overall ATD
conformational diversity between different proteins and different
and is directly comparable between different ATDs.
bound forms of a protein. The use of our described methodology

n therefore concisely reports upon multiple useful data characteris-
I
i=1 i (ti
− IWMATD )2
IWSDATD = n (3) tics present in the original ATDs or CCS vs. collision energy plots.
I
i=1 i
3.2. Multicomponent analysis of ATDs
The IWMATD and IWSDATD values for the synthetic example are
indicated on Fig. 1A, and, as intended, these metrics summarise The unfolding of proteins often yields several intermediate
the properties of the distributions in a way that allows them to conformational families. Changes in the abundance of such inter-
be distinguished, i.e. distributions (a) and (b) differ in both metrics, mediate conformational families can be used to compare different
and while (a) and (c) have the same value for IWMATD, the IWSDATD variants of a given protein, e.g. wild-type versus mutant forms, or
value for (c) is much greater. to probe the effects of ligand binding to a protein. Up until now,
IM-MS CIU data were obtained on three model proteins, estimates of the abundance of conformers have been made using
lysozyme, ␤-lactoglobulin and myoglobin (Fig. 1B) and used to peak heights at manually chosen drift times. The individual ATDs
analyse the utility of this approach with real data. In order to for each conformational family, however, are almost always never
remove the effects of mass and charge, and provide a direct com- baseline resolved, therefore, inferring their abundance using peak
parison between proteins, we choose to convert the ATDs to CCSs heights does not provide a true reflection of their actual abundance.
(see materials and methods) and the trap collision voltages to lab- The strongly overlapped character of the data also makes the use of
oratory frame voltage (charge state multiplied by voltage, which gradient descent optimisation techniques, such as non-linear least
enables the comparison of different charge states). The equivalent squares, inefficient in fitting individual Gaussians (each used to rep-
metrics are then the intensity weighted mean of the CCS values (Eq. resent one conformational family) to the data. Here we describe and
(4)) and the corresponding intensity weighted standard deviation evaluate a novel genetic algorithm (GA)-based approach for the fit-
(Eq. (5)). ting and deconvolution of ATDs obtained from gas phase unfolding
n experiments.
I CCSi
i=1 i Genetic algorithms (GAs) are a class of artificial intelligence opti-
IWMCCS =  n (4)
I
i=1 i
misation algorithms, which mimic evolution in order to find an
optimal solution. Potential solutions to a problem are encoded as
and chromosomes and a set of chromosomes make up a population.

n A single chromosome contains all the parameters which are to be
I
i=1 i (CCSi
− IWMCCS )2
IWSDCCS =  n (5) optimised, which are known as genes. Populations of chromosomes
I
i=1 i evolve over a number of generations with different evolutionary
processes operating on these genes and chromosomes (crossover,
where I is the intensity in each CCS interval.
mutation) with each chromosome evaluated for its fitness in find-
Fig. 1C shows the results of calculating the IWMCCS from the
ing an improved solution [36,37].
experimental data. For lysozyme, IWMCCS does not change as much
A purpose-built framework for deconvoluting unfolding data
as for the other two proteins. This is expected as lysozyme is held
has been developed and is presented here. An important feature
together with four disulphide bonds. Whilst ␤-lactoglobulin and
of proteins revealed in the ion mobility data is that, as a pro-
myoglobin have very close CCSs at low energies, at higher ener-
tein unfolds in the gas-phase, it shifts between discrete families of
gies myoglobin (that contains no disulphide bonds) unfolds more
conformations and does not progressively unfold in a continuous
readily than ␤-lactoglobulin (stabilized by disulphide bonding).
manner. This feature is used as a constraint in the fitting procedure;
After the IWMCCS of myoglobin reaches its highest value, it starts
multiple ATDs recorded at different collision energies are deconvo-
to reduce. This effect is likely due to the heme group dissociat-
luted simultaneously but the peak centres of each conformation are
ing from the most extended conformational family, which causes
kept constant across all ATDs.
a m/z shift outside the quadrupole isolation window, and conse-
The fitness (F) of a chromosome is calculated as the additive
quently an over representation of the more compact conformations
inverse of the error. Two methods for calculating error can be used,
is observed. In Fig. 1D the difference in CCS in relation to the CCS
the sum of absolute differences and the sum of squares. We choose
measured at the lowest voltage is shown. This representation sim-
to use the sum of squares error in Eq. (6) where a is the number
plifies for the direct comparison of proteins that differ in size.
of ATDs in the dataset, ␮,  and A are the mean, full width half
Fig. 1E shows the intensity weighted standard deviation CCS
maximum (FWHM) and amplitude of each Gaussian distribution,
(IWSDCCS ) as a function of increasing collision energies. Regions
respectively.
with higher standard deviation values are indicative of confor-
mational variability; the protein likely exists in several distinct ⎛ ⎛ 2
⎞2 ⎞
(xi,j,k −i )
conformations increasing the spread of the ATD. For myoglobin a

d

n
⎜ ⎜

2  /2 2 ln 2) ⎟ 2 ⎟
√ 2
the highest IWSDCCS is observed at 100–200 V, (from the CCS data F =− ⎝ ⎝Ai,k e ( i,k ⎠ − yj,k ⎠ (6)
in Fig. 1B this appears to correspond to a mixed population of k=1 j=1 i=1
semi-compact and more extensively unfolded protein) while for ␤-
lactoglobulin this is observed at 180–280 V. For both these proteins, This fitness measure adds a greater penalty for large devia-
there is a decline in this value at higher collision energies indi- tions, in comparison to the sum of absolute differences, between
cating that fewer distinct conformations are present. In contrast, experimental and simulated data allowing it to more aggressively
G.N. Sivalingam et al. / International Journal of Mass Spectrometry 426 (2018) 29–37 33

grey, are 63% and 35% using the peak height method, but 79% and
20% for the peak area analysis thus illustrating the benefits of defin-
ing conformational abundance as the relative areas enclosed within
deconvoluted peaks rather than simply using their heights.

3.4. CIU-IM-MS characterization of ˛1 -antitrypsin interaction


with the tetrapeptide TTAI

To illustrate the applicability of our new software in analyzing


a real-world biological problem, we used it to study the effects of
peptide binding on the conformational behavior of the protein ␣1 -
antitrypsin.
␣1 -antitrypsin is the most abundant human plasma protease
inhibitor, circulating at concentrations of 1–2 mg/ml. Under phys-
iological conditions ␣1 -antitrypsin, like other members of the
serpin (serine protease inhibitor) protein superfamily, adopts a
metastable conformation upon folding within the cell. An unstruc-
tured region in the protein termed the reactive centre loop (RCL)
is used as substrate for the target enzyme. A transition from a
metastable to a hyperstabilised enzyme-complexed state underlies
their functional mechanism [38] during which, following cleavage
by the enzyme, the RCL inserts as an extra strand in the main beta-
sheet of ␣1 -antitrypsin. Homopolymerisation of ␣1 -antitrypsin,
Fig. 2. Deconvolution and abundance analysis of ␤-lactoglobulin gas phase unfold- due to pathological mutations that enable strand insertion without
ing. (A) Deconvoluted ATDs, experimental data is shown in black and the simulation cleavage, results in a disease called ␣1 -antitrypsin deficiency which
is red, individual conformations are coloured the same throughout the figure. (B) Rel-
ative proportions of deconvoluted conformational families assessed by area under
is associated with severe lung (early-onset, panacinar emphysema)
the corresponding Gaussian. (C) Relative abundance determined using peak height and liver (hepatic cirrhosis, hepatocellular carcinoma) disease
analysis. (For interpretation of the references to color in this figure legend, the reader [39,40].
is referred to the web version of this article.) We have recently analysed the interaction of ␣1 -antitrypsin
with TTAI, a peptide identified from a combinatorial chemistry
optimise against larger deviations between the fit and the data. The approach that has been shown to inhibit ␣1 -antitrypsin polymeri-
result of this is that the algorithm obtains fits that are close to the sation [41], using a combination of native and ion mobility mass
original data within fewer generations. spectrometry, X-ray crystallography and NMR spectroscopy [25].
Multiple runs of the algorithm were performed in order to We observed that TTAI binds with a 2:1 peptide:protein stoi-
establish the optimal, mutation, crossover and population size (see chiometry and that binding of two copies of TTAI increases the
Supplementary text and Supplementary Figs. 3 and 4). conformational stability of ␣1 -antitrypsin. We demonstrated that
IM-MS is an ideal method for high-throughput screening of com-
3.3. Deconvolution and abundance analysis of CIU in model pounds for the inhibition of diseases mediated by protein kinetic
proteins instability. However, although the time required to obtain the CIU
IM-MS data was much less than either the X-ray or NMR exper-
To test our algorithm we first used it to deconvolute iments, data analysis was limited and labor-intensive due to the
synthetically generated collision-induced unfolding data. The lack of programs to process CIU data at that time. We therefore
deconvolution algorithm achieved very good agreement with the focused on deconvoluting the ATDs between bound and unbound
original data (Supplementary Fig. 2). We then tested our algorithm versions at only one collision voltage representative of the most
on the CIU data obtained for three model proteins ␤-lactoglobulin conformational variability. Here we revisit the binding of TTAI to
(∼18 kDa), myoglobin (∼17 kDa) and lysozyme (∼14 kDa). The ␣1 -antitrypsin by a) acquiring many more collision voltages during
results of the fitting process for ␤-lactoglobulin are shown in Fig. 2. the unfolding process and b) processing the data using our newly
Agreement between experimental and simulated data, generated developed software so that abundance of each conformational fam-
from parameters determined by the GA-based deconvolution, is ily is measured across the entire CIU experiment. The ability to deal
very good with an overall error of 19.4 (the sum of all differences with more finely grained collision voltages during a CIU experiment
between simulated and original data as a proportion of the base ensures that greater detail of the structural transitions, at specific
peak). Individual conformations are coloured differently with the energies, is observed for the native (apo) form. This enables the
sum of all conformations coloured in red. There is a major struc- study of conformational intermediates and how these are affected
tural transition between 120 and 280 eV. At voltages of 280 eV new by small molecule/peptide binding events.
conformational families, distinct from to those at voltages less than The CIU fingerprints for apo and double TTAI-bound ␣1 -
120 eV, become populated. This observation is in agreement with antitrypsin are shown in Fig. 3A. It can be seen that the variation
the summary representation of such data shown in Fig. 1, but here in CCS is greater for the apo protein than for the peptide-bound
the individual conformational families can be deconvoluted and protein indicating that the double-bound protein has less confor-
their areas quantified as shown in Fig. 2B. The area under the curve mational flexibility than the unbound protein.
for each conformation defined by deconvolution was normalised Unfolding curves were generated from the data, whereby the
to the percentage of all conformations present. Fig. 2C shows the increase in IWMCCS is plotted against increases in the collision
equivalent population estimates of each conformation when only energy (Fig. 3B and C). The unfolding curves in panel A show
peak heights are taken into account. Even though the overall trends that even though the CCS of the double-bound protein is larger
shown in Fig. 2B and C are similar, the relative abundances of each than the apo form under native-like conditions, as the collision
conformation are different. For the lowest collision energy, the energy increases the apo form appears to unfold more readily.
adundances of the two major conformations, coloured blue and This is shown more clearly when monitoring the change in CCS as
34 G.N. Sivalingam et al. / International Journal of Mass Spectrometry 426 (2018) 29–37

Fig. 3. ␣1 -antitrypsin collision induced unfolding fingerprints (A) for apo (left) and TTAI-bound (right); (B) unfolding curves showing the IWMCCS at varying collision energies,
(C) as with (B) except showing change in IWMCCS from the lowest voltage, (D) Conformational variability analysis; IWSDCCS vs. collision energy.
G.N. Sivalingam et al. / International Journal of Mass Spectrometry 426 (2018) 29–37 35

Fig. 4. Deconvolution of gas phase unfolding data for apo (A) and double bound (B) ␣1 -antitrypsin. Experimental data is shown in black, with the sum of the deconvolution in
red. The deconvoluted distribution of each conformation is shown with coloured lines. The abundance analysis of each conformational family determined by the deconvolution
is shown for the apo (C) and holo (E) protein with the FWHM for each of the deconvoluted peaks shown in E and F respectively.

a function of increasing collision energy. The double-bound form ent species in the gas phase during the timescale of the IM-MS
consistently demonstrates a reduced degree of unfolding in terms experiment. Conformational exchange rates that are slow relative
of increase in CCS. to this timescale will tend to cause distinct inflection points, whilst
To quantify and further confirm the change in conformational those that are more rapid will tend to cause peak broadening. More-
flexibility seen in the CIU fingerprint analysis, the IWSDCCS was over, the differences in CCS between different conformer families
plotted (Fig. 3D). Critically the bound form has a 30–40% smaller of the same monomeric protein approach the technical limits of
IWSDCCS , indicating lower conformational heterogeneity, across all current resolution on the instruments used. Together these factors
collision energies. This confirms that the reduction in conforma- result in increased peak overlap. Deconvolution is therefore key
tional lability (increase in kinetic stability) induced by TTAI binding to quantitative analysis of such phenomena. The results of such
previously described at higher energies by CIU-IM-MS holds true deconvolution using our GA-based algorithm are shown in Fig. 4.
at the lowest energy (most native-like) conditions. Towards the Any deconvolution approach is potentially vulnerable to overfit-
highest collision energy, the variability of the apo form appears ting, whereby increasing the number of species (i.e. the number of
to start increasing; this may indicate another unfolding event tak- Gaussians to include in the deconvolution) will improve the appar-
ing place. In contrast, these energies are associated with reduced ent fit of the solutions to the data, beyond the resolution of the data.
variability in the ternary-complex state. This indicates that at the We were therefore keen to establish a quantitative method to select
highest energies studied the protein:peptide complex converges the appropriate number of families that the data could define. To
on a specific, stable conformation. It is possible, however, that this this end we ran our GA algorithm several times, each time vary-
convergence could represent the behaviour of the peptide-bound ing the number of conformational families (each represented by a
state before peptide release since TTAI dissociation would cause a Gaussian) to be fitted and the seed number used by the GA to ran-
concomitant shift out of the region of the m/z spectrum selected for domly generate the initial population. The resulting ‘elbow plot’ of
IM-MS analysis. fitting error versus number of conformational families is shown in
Supplementary Fig. 5. This plot shows that the fitting error reaches
3.5. Deconvolution of ˛1 -antitrypsin data a plateau beyond approximately 5 or 6 families. The results shown
in Fig. 4 from the fitting of the ATDs with 6 conformational families
The summary statistics shown in Fig. 3 confirm that on average and from the run that resulted in the lowest fitting error.
the binding of TTAI stabilises the ensemble behaviour of ␣1 - The individual peaks fitted to the data for the apo and holo forms
antitrypsin. We used our GA-based algorithm to understand this of the protein are shown in Fig. 4A and B respectively. The abun-
in more detail, in terms of the populations of different conformeric dance for each deconvoluted peak is shown in Fig. 4C and E while
states within the ensemble. The results of the deconvolution are Fig. 4D and F show the variation in FHWM for each peak across the
shown in Fig. 4. collision energy ramp.
In the original ATD data, families of different conformational The more compact conformational family, coloured in grey, is
states are indicated by inflection points (directly observable ‘peaks’, present over the same range of collision energies and persists until
‘shoulders’ and ‘tails’). The relative populations of the different about 30 V, after which it disappears. However, the width of this
states, however, cannot be directly assessed and then compared family, as shown in Fig. 4D and F, is much narrower for the holo than
without deconvolution. ␣1 -antitrypsin is a highly metastable the apo form, illustrating stabilization induced by peptide binding.
protein in solution and so the composite ATD profile likely This is consistent with a reduction in conformational lability relat-
reflects varying rates of conformational exchange between differ- ing to increased activation energy for transitions between states
36 G.N. Sivalingam et al. / International Journal of Mass Spectrometry 426 (2018) 29–37

Fig. 5. Deconvoluted CIU fingerprints for apo (A) and double bound (B) ␣1-antitrypsin. Each conformational family is shown as a heat map during the collision energy ramp.
The abundance, variability and persistence of each conformation can be seen at the same time. To make each conformational family clearer, the arrival time axis is shown as
the difference from the conformation mean, shown for each plot in the bottom left.

within the conformational family, that are more evident in the apo screening putative drug molecules for their effects upon the confor-
form. It is interesting to note that the third and fourth conforma- mational behaviour of the protein. For proteins liable to pathogenic
tional families, coloured in purple and green, appear at much lower conformational change each of these measures may have specific
energies in the apo versus the holo form and persist over a larger relevance. In the case of ␣1 -antitrypsin, the conformational vari-
range of voltages. We attribute the delay in appearance of these ability of disease variants may correlate better with pathogenicity
families in the holo-form to peptide-induced stabilization. than overall stability or resistance to unfolding [25,42].
Our program also allows the same data to be represented In a high throughput process, our summary metrics could allow
more intuitively, in heat map format so that the abundance, con- standardised selection of molecules of interest for subsequent, in-
formational variability (width of arrival time distribution) and depth analysis, by the GA deconvolution algorithm. An example of
persistence (range of collision voltages) of each family can be seen this would be to select ligands where at a given voltage the percent-
at once as seen in Fig. 5. This view makes it much easier to evalu- age change in CCS is under 5% and the standard deviation is under
ate the effects of peptide binding on the different conformational 150 Å2 . Without the introduction of robust CIU processing software
families. Such differences may have been difficult to detect and, the use of consistent threshold values would not be possible due to
certainly, to accurately quantify without multi-component decon- the high variability of manual interpretation.
volution analysis of these data. The GA deconvolution algorithm can produce detailed infor-
mation for the unfolding process within the same time frame as
the data acquisition. For increased accuracy we would recommend
4. Conclusions
proceeding to run the GA multiple times with varying number of
conformational families as this will provide evidence that helps
The benefits of statistically summarising gas phase unfolding
in selecting the most appropriate number of Gaussian peaks to fit
data have been outlined here. By using the intensity-weighted
to the data. Running the GA a few times with different random
mean (IWMATD and IWMCCS ) and standard deviation (IWSDATD and
seeds, used by the GA to generate the initial population, can have
IWSDCCS ) to describe the ion mobility data (Fig. 1) the degree of
an effect on the resulting fit. The user can then assure themselves of
unfolding and conformational flexibility can be readily compared
the robustness of the fitting procedure and select the run resulting
between different proteins and protein-ligand complexes. A novel
in the lowest overall error.
method for deconvoluting IM-MS arrival time distributions, based
We have illustrated how this approach can be used for this pur-
on a genetic algorithm, is also presented. This approach offers a
pose by studying the effects of peptide binding on the medically
more accurate representation of the peak positions, abundance
important protein ␣1 -antitrypsin. Our current analysis reveals a
and CCS variation for each conformational family present in the
more in-depth picture of how peptide binding alters intermedi-
unfolding data.
ate stability compared to our previous study [25]. Future work
The novel methodology described here provides many of the
will involve screening a large number of drug molecules against
features necessary for application in high throughput drug discov-
wild-type and disease variant forms of ␣1 -antitrypsin. Once a large
ery. Data can be automatically acquired by chip-based nESI robots
number of molecules are screened, machine learning methods can
like the TriVersa NanoMate. Raw data can be extracted in an auto-
be used to classify them and generate ‘fingerprints’ such as in the
mated fashion using our previously published software, Amphitrite
work of [26]. The availability of such fingerprints would likely fur-
[29]. Our current software can then automatically create unfolding
ther limit the time of experimental analysis as only data at select
and variability curves along with CIU fingerprints. The software
presented here reports quantitative measures of benefit when
G.N. Sivalingam et al. / International Journal of Mass Spectrometry 426 (2018) 29–37 37

collision energies would be required in order to classify the behav- [17] E.B. Erba, et al., Ion mobility-mass spectrometry reveals the influence of
ior of a new drug. subunit packing and charge on the dissociation of multiprotein complexes,
Anal. Chem. 82 (23) (2010) 9702–9710.
[18] K. Pagel, et al., Alternate dissociation pathways identified in charge-reduced
Acknowledgements protein complex ions, Anal. Chem. 82 (12) (2010) 5363–5372.
[19] L. Han, et al., Bound anions differentially stabilize multiprotein complexes in
the absence of bulk solvent, J. Am. Chem. Soc. 133 (29) (2011) 11358–11367.
Adam Cryar was funded by a BBSRC CASE studentship [20] L. Han, S.J. Hyung, B.T. Ruotolo, Bound cations significantly stabilize the
BB/F016948/1. Bibek Gooptu was supported by an Alpha-1 Foun- structure of multiprotein complexes in the gas phase, Angew. Chem. Int. Ed.
dation Research Project Grant. Engl. 51 (23) (2012) 5692–5695.
[21] M.P. Nyon, et al., An integrative approach combining ion mobility mass
spectrometry, x-ray crystallography, and nuclear magnetic resonance
Appendix A. Supplementary data spectroscopy to study the conformational dynamics of alpha1-antitrypsin
upon ligand binding, Protein Sci. 24 (8) (2015) 1301–1312.
[22] A. Laganowsky, et al., Membrane proteins bind lipids selectively to modulate
Supplementary data associated with this article can be found, in their structure and function, Nature 510 (7503) (2014) 172–175.
the online version, at https://doi.org/10.1016/j.ijms.2018.01.008. [23] J.T.S. Hopper, N.J. Oldham, Collision induced unfolding of protein ions in the
gas phase studied by ion mobility-mass spectrometry: the effect of ligand
binding on conformational stability, J. Am. Soc. Mass Spectrom. 20 (10) (2009)
References 1851–1858.
[24] S.J. Hyung, C.V. Robinson, B.T. Ruotolo, Gas-phase unfolding and disassembly
[1] A. Konijnenberg, A. Butterer, F. Sobott, Native ion mobility-mass spectrometry reveals stability differences in ligand-bound multiprotein complexes, Chem.
and related methods in structural biology, Biochim. Biophys. Acta 1834 (6) Biol. 16 (4) (2009) 382–390.
(2013) 1239–1256. [25] M.P. Nyon, et al., An integrative approach combining ion mobility mass
[2] K. Thalassinos, et al., Conformational states of macromolecular assemblies spectrometry, X-ray crystallography and NMR spectroscopy to study the
explored by integrative structure calculation, Structure 21 (9) (2013) conformational dynamics of alpha -antitrypsin upon ligand binding, Protein
1500–1508. Sci. 24 (8) (2015) 1301–1312.
[3] E. Jurneczko, et al., Probing the conformational diversity of cancer-associated [26] J.N. Rabuck, et al., Activation state-selective kinase inhibitor assay based on
mutations in p53 with ion-mobility mass spectrometry, Angew. Chem. Int. Ed. ion mobility-mass spectrometry, Anal. Chem. 85 (15) (2013) 6995–7002.
52 (16) (2013) 4370–4374. [27] S.D. Pringle, et al., An investigation of the mobility separation of some peptide
[4] K. Pagel, et al., Intrinsically disordered p53 and its complexes populate and protein ions using a new hybrid quadrupole/travelling wave IMS/oa-ToF
compact conformations in the gas phase, Angew. Chem. Int. Ed. 52 (1) (2013) instrument, Int. J. Mass Spectrom. 261 (1) (2007) 1–12.
361–365. [28] M.T. Marty, et al., Bayesian deconvolution of mass and ion mobility spectra:
[5] M. Wojnowska, et al., Autophosphorylation activity of a soluble hexameric from binary interactions to polydisperse ensembles, Anal. Chem. 87 (8)
histidine kinase correlates with the shift in protein conformational (2015) 4370–4376.
equilibrium, Chem. Biol. 20 (11) (2013) 1411–1420. [29] G.N. Sivalingam, et al., Amphitrite: a program for processing travelling wave
[6] G.R. Hilton, et al., C-terminal interactions mediate the quaternary dynamics of ion mobility mass spectrometry data, Int. J. Mass Spectrom. 345–347 (2013)
alpha B-crystallin, Philos. Trans. R. Soc. B-Biol. Sci. 368 (1617) (2013). 54–62.
[7] D.P. Smith, et al., Monitoring copopulated conformational states during [30] J.D. Eschweiler, et al., CIUSuite: a quantitative analysis package for collision
protein folding events using Electrospray ionization-ion mobility induced unfolding measurements of gas-phase protein ions, Anal. Chem. 87
spectrometry-mass spectrometry, J. Am. Soc. Mass Spectrom. 18 (12) (2007) (22) (2015) 11516–11522.
2180–2190. [31] T.M. Allison, et al., Quantifying the stabilizing effects of protein-ligand
[8] T. Wyttenbach, et al., The effect of calcium ions and peptide ligands on the interactions in the gas phase, Nat. Commun. 6 (2015) 8551.
relative stabilities of the calmodulin dumbbell and compact structures, J. [32] H. Hernandez, C.V. Robinson, Determining the stoichiometry and interactions
Phys. Chem. B 114 (1) (2010) 437–447. of macromolecular assemblies from mass spectrometry, Nat. Protoc. 2 (3)
[9] J.A. Leary, et al., Methodology for measuring conformation of (2007) 715–726.
solvent-disrupted protein subunits using T-WAVE ion mobility MS: an [33] M.F. Bush, et al., Collision cross sections of proteins and their complexes: a
investigation into eukaryotic initiation factors, J. Am. Soc. Mass Spectrom. 20 calibration framework and database for gas-phase structural biology, Anal.
(9) (2009) 1699–1706. Chem. 82 (22) (2010) 9557–9565.
[10] B.T. Ruotolo, et al., Evidence for macromolecular protein rings in the absence [34] K. Thalassinos, et al., Characterization of phosphorylated peptides using
of bulk water, Science 310 (5754) (2005) 1658–1661. traveling wave-based and drift cell ion mobility mass spectrometry, Anal.
[11] C.A. Scarff, et al., Travelling wave ion mobility mass spectrometry studies of Chem. 81 (2009) 248–254.
protein structure: biological significance and comparison with X-ray [35] Y. Zhong, S.J. Hyung, B.T. Ruotolo, Ion mobility-mass spectrometry for
crystallography and nuclear magnetic resonance spectroscopy structural proteomics, Expert Rev. Proteomics 9 (1) (2012) 47–58.
measurements, Rapid Commun. Mass Spectrom. 22 (2008) 3297–3304. [36] J.H. Holland, Genetic algorithms, Sci. Am. 267 (1) (1992) 66–72.
[12] S.L. Bernstein, et al., alpha-synuclein: stable compact and extended [37] A.E. Eiben, J. Smith, From evolutionary computation to the evolution of things,
monomeric structures and pH dependence of dimer formation, J. Am. Soc. Nature 521 (7553) (2015) 476–482.
Mass Spectrom. 15 (10) (2004) 1435–1443. [38] B. Gooptu, D.A. Lomas, Conformational pathology of the serpins: themes,
[13] K.B. Shelimov, et al., Protein structure in vacuo: gas-phase confirmations of variations and therapeutic strategies, Annu. Rev. Biochem. 78 (2009) 147–176.
BPTI and cytochrome c, J. Am. Chem. Soc. 119 (9) (1997) 2240–2248. [39] B. Gooptu, J.A. Dickens, D.A. Lomas, The molecular and cellular pathology of
[14] S.J. Valentine, et al., Disulfide-intact and -reduced lysozyme in the gas phase: ␣1-antitrypsin deficiency, Trends Mol. Med. 20 (2) (2014) 116–127.
conformations and pathways of folding and unfolding, J. Phys. Chem. B 101 [40] R.A. Stockley, A.M. Turner, alpha-1-Antitrypsin deficiency: clinical variability,
(19) (1997) 3891–3900. assessment, and treatment, Trends Mol. Med. 20 (2) (2014) 105–115.
[15] S.L. Bernstein, et al., Amyloid beta-protein: monomer structure and early [41] Y.P. Chang, et al., Small-molecule peptides inhibit Z alpha1-antitrypsin
aggregation states of A beta 42 and its Pro(19) alloform, J. Am. Chem. Soc. 127 polymerization, J. Cell. Mol. Med. 13 (2009) 2304–2316.
(7) (2005) 2075–2084. [42] A.S. Knaupp, et al., Kinetic instability of the serpin Z ␣1 -antitrypsin promotes
[16] B.T. Ruotolo, et al., Ion mobility-mass spectrometry reveals long-lived, aggregation, J. Mol. Biol. 396 (2010) 375–383.
unfolded intermediates in the dissociation of protein complexes, Angew.
Chem. Int. Ed. 46 (42) (2007) 8001–8004.

You might also like