Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

VIEWPOINT

Translational Proteomics www.proteomics-journal.com

Addressing the Challenges of High-Throughput Cancer


Tissue Proteomics for Clinical Application: ProCan
Brett Tully,* Rosemary L. Balleine, Peter G. Hains, Qing Zhong, Roger R. Reddel,
and Phillip J. Robinson

biomarkers are lacking for key targeted


The cancer tissue proteome has enormous potential as a source of novel therapeutics, including immune check-
predictive biomarkers in oncology. Progress in the development of mass point inhibitors and VEGF-targeted
spectrometry (MS)-based tissue proteomics now presents an opportunity to agents.[5,6]
exploit this by applying the strategies of comprehensive molecular profiling The cancer tissue proteome is an
under-explored domain with huge po-
and big-data analytics that are refined in other fields of ‘omics research.
tential for novel biomarker discovery
ProCan (ProCan is a registered trademark) is a program aiming to generate because proteins are the chief structural
high-quality tissue proteomic data across a broad spectrum of cancer types. It and signaling molecules in cancer cells.
is based on data-independent acquisition–MS proteomic analysis of Moreover, the opportunity to profile the
annotated tissue samples sourced through collaboration with expert clinical complex cancer tissue milieu could be
and cancer research groups. The practical requirements of a high-throughput informative for predicting response to
agents that have indirect modes of ac-
translational research program have shaped the approach that ProCan is tion. To date, the contribution of cancer
taking to address challenges in study design, sample preparation, raw data tissue proteomics to biomarker discovery
acquisition, and data analysis. The ultimate goal is to establish a large has been hampered by poor reproduci-
proteomics knowledge-base that, in combination with other cancer ‘omics bility. However, the development of high-
data, will accelerate cancer research. sensitivity quantitative MS techniques
coupled with improvements in tissue-
processing methods and big data
analytics now advance the prospects for both biomarker discovery
This era of possibility for matching specific anti-cancer agents to and clinical application.
points of vulnerability in cancer cells has led to an urgent search There are many challenges to overcome before the potential
for novel predictive biomarkers. The drive comes from experi- of proteomics in oncology is realized. Chief among these is to
ence with highly successful therapy-biomarker matches includ- establish a knowledge-base of sufficient size and scope to address
ing detection of direct therapeutic targets in cancer cells, such as complex clinical issues. To date, MS-based tissue proteomics has
HER2 amplification in breast cancer,[1] and more indirect indica- been a relatively low-throughput technology and most cohorts in
tors of benefit such as BRCA1/2 mutation status and response to published cancer studies are small. To enable robust discovery, a
PARP1 inhibitors in ovarian cancer.[2] It has also become integral large number of cancer samples must be analyzed in a consistent,
to the design of early phase clinical trials of molecularly targeted or at least a comparable way. The corollary is that a commitment
agents, with biomarkers forming part of inclusion criteria, even is needed to develop core processes, methods, and infrastructure
across disparate cancer types.[3] that can operate at scale, and without compromise to the central
The evolution of methods for high-throughput genomic and tenets of analytical validity and clinical utility that are critical for
transcriptomic profiling of cancer has enabled this field of dis- clinical application.
covery and practice change. A major limitation is that genomics
analysis currently succeeds in identifying “actionable” abnor-
malities in a minority of cancer patients,[4] and robust predictive 1. ProCan
The Australian Cancer Research Foundation International Cen-
Dr. B. Tully, Prof. R. L. Balleine, Dr. P. G. Hains, Dr. Q. Zhong,
Prof. R. R. Reddel, Prof. P. J. Robinson tre for the Proteomics of Human Cancer (ProCan) is an effort
ProCan to advance the field of cancer diagnostics by contributing pan-
Children’s Medical Research Institute cancer tissue proteomics data. At its core is a custom-built MS
Faculty of Medicine and Health facility currently housing six instruments that function as a sin-
The University of Sydney
Westmead, NSW 2145, Australia
gle operational unit with capacity of around 10 000 sample runs
E-mail: btully@cmri.org.au per year. Over the next 5 years, ProCan aims to profile can-
cers, pediatric and adult, solid and hematologic, and represent-
The ORCID identification number(s) for the author(s) of this article ing most known cancer types through a network of collaborations
can be found under https://doi.org/10.1002/pmic.201900109 with expert clinicians and other cancer researchers. The overall
DOI: 10.1002/pmic.201900109 goal is to develop clinically applicable biomarkers by addition of

Proteomics 2019, 1900109 1900109 (1 of 5) 


C 2019 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.proteomics-journal.com

Figure 1. ProCan flywheel. The principal objective of ProCan is to build a body of insight and knowledge about cancer. This is an iterative process where
each cycle, driven by a single question of unmet clinical need, enables a body of knowledge to be created and the proteomic landscape of cancer to
be expanded. Around the wheel, there are loci of activities, each of which contain many challenges and decision points, especially those related to the
practicalities of performing high-quality science in a biobank-scale context. As the landscape grows, pan-cancer analyses become possible, offering a
new and unique perspective that we expect will enable insights not obtained through other means.

high-quality proteomics data to existing stores of clinical and The key to clinical application of tissue-based proteomics is to
molecular information (Figure 1). adapt to the practical requirements of clinical workflow. In the
The ProCan program is structured as a series of individual discovery phase, a major implication for ProCan is to prioritize
hypothesis-driven research studies focused on individual cancer analysis of formalin-fixed paraffin-embedded (FFPE) over fresh-
types that combine to form a pan-cancer knowledge-base over the frozen (FF) tissue samples where possible. Our experience with
course of the program. Operational decision-making has been FFPE tissue proteomics is consistent with other reports show-
shaped by the practical requirements of clinical research and ing that high-quality data can be generated and that the scale and
high-throughput proteomic studies performed at biobank-scale. scope of quantifiable proteins is comparable with FF tissues.[7]
Here, we describe major challenges we have identified and our However, there may be differences in the proteomic profiles of
approach to addressing some of these. FFPE and FF samples that require definition of distinct classi-
fiers for the two sample types. Furthermore, in order to ensure
throughput, we are focusing on analysis of unmodified peptides.
1.1. Study Design However, re-analysis of publicly available ProCan data by us or
others will enable questions currently out of scope to be explored
Individual studies that comprise the majority of ProCan research in the future.
are conducted in collaboration with expert clinical or research
groups. Priority research collaborations are based on the avail-
ability of tissue sample collections that are richly annotated with 1.2. Sample Preparation
clinical data, and ideally have other ‘omic data available for inte-
grated analysis with proteomics (Figure 2). A major considera- Cancer tissue samples are a complex mix of malignant, reac-
tion in study design is to ensure that the hypotheses to be tested tive, and normal elements and diagnostic interpretation relies
address areas of “unmet clinical need” relevant to specific can- on morphological assessment by a specialist pathologist. To in-
cers. In this respect, input from domain-expert collaborators at terpret data from proteomics, which is tissue disruptive, it is im-
the stage of project planning and monitoring is critical. In the fu- perative that the components of a sample submitted for MS are
ture, biomarker-seeking sub-studies integrated into clinical trials known. Isolation of cancer cells from tissue by microdissection
may provide opportunities to facilitate these interactions. is a strategy to directly assess their proteomic features. However,
A challenge in conducting a large number of collaborative this is a very labor-intensive process that is not well suited to
projects across institutions and jurisdictions has been to de- high-throughput, or clinical application. It is also likely that a pro-
velop systems to manage regulatory processes, including human teomic profile from the tumor microenvironment will be infor-
research ethics/institutional review board approvals, materials mative and is important to capture.
transfer, and other inter-institutional agreements. The time and The strategy that ProCan is using to address these issues
resource cost of this aspect is substantial. An advantage is that ex- is to prepare adjacent sections from tissue blocks for matched
perience in this area has been advanced by the conduct of large histopathologic review and proteomics analysis. A technical chal-
international collaborative studies over the past two decades, and lenge that this has created in the case of FF samples is to
our experience indicates a high level of familiarity with relevant develop efficient methods for removal of optimal cutting tem-
issues as well as dedicated expertise available in many institu- perature compound prior to MS. In the future, it may be-
tions to support researchers dealing with these challenges. come possible to deconvolute a tissue proteomic profile to

Proteomics 2019, 1900109 1900109 (2 of 5) 


C 2019 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.proteomics-journal.com

Figure 2. Schematic of ProCan matrix. ProCan aims to survey the proteomic landscape of all types of cancer via a series of individual collaborative projects
(represented here as radial panels). A typical project is designed to address a clinically relevant question by analyzing a cohort of cancer samples that
has been collected by a collaborating research group and annotated with clinical data, including response to treatment and survival. Wherever possible,
a matched section of the tumor—immediately adjacent to tissue processed for mass spectrometry—is analyzed by histopathology review. Cohorts for
which various combinations of other ‘omic data are available (indicated here by colored segments; WGS, whole genome sequencing; WES, whole exome
sequencing) are prioritized, so that correlation between proteomics and other ’omics can be addressed. The use of a single proteomics technology
across all ProCan projects will facilitate a pan-cancer overview of the human cancer tissue proteome.

estimate the contribution of different tissue elements, as has time. ProCan research is largely based on a variant of label-free
been achieved to a degree with transcriptomic data.[8] ProCan is LC-MS/MS that is particularly well suited to comparative
curating a large-scale database of digitized whole slide images analysis of a large number of tissue samples; Sequential
anticipating that it will become possible for machine learning Window Acquisition of all THeoretical mass spectra/Data-
techniques to enable more direct integration of histopathology Independent Acquisition (SWATH/DIA).[11] DIA/SWATH gives
and tissue proteomic profiles. close to complete detection of peptides from tryptic digestion
After samples are collected, tissue lysis and digestion protocols of complex samples, and its applicability to comparative can-
must be rapid, efficient, reproducible, and broadly applicable to cer tissue analysis has been demonstrated.[9] It does rely on
tissues of different kinds and from different source laboratories. availability of relevant spectral reference libraries, and our
In addition, the methodology should be adaptable for integration strategy is to use two-dimensions of chromatography from
of robotics to facilitate throughput where possible. ProCan has pooled samples to make libraries representative of all samples
instituted the use of pressure-cycling technology (barocyclers) to analyzed.
achieve consistent lysis and digestion of tissue samples.[9] In the There are many practical considerations in the design and
establishment phase, we have made considerable effort to refine operation of a biobank-scale LC-MS/MS facility. A key for Pro-
protocols that achieve shorter processing times while improving Can is to use standardized approaches to SWATH/DIA data
digestion and peptide yield, and maintaining reproducibility and capture, and to give priority to time- and cost-saving modifi-
sample quality. For example, simultaneous Lys-C and tryptic di- cations, including use of readily available reagents and com-
gestion is used in an accelerated manner as an aid to sample ponents such as commercially available capillary columns. In
throughput.[10] addition, the entire system requires constant monitoring and
quality control (QC). ProCan has enabled continuous QC by
selected use of simple (bovine serum albumin) and complex
1.3. Raw Data Acquisition (HEK293 cells) digests coupled to automated monitoring sys-
tems and protocols. These external controls are in addition to
LC-MS/MS is a highly sensitive, mature technology capable synthetic peptide standards that have been selected to span the
of identifying and providing relative quantitation of many retention time of the experiment and are included in every
thousands of proteins in a tissue sample in a 1 to 2 h run injection.

Proteomics 2019, 1900109 1900109 (3 of 5) 


C 2019 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.proteomics-journal.com

LC-MS/MS instrumentation requires intensive maintenance 1.5. Multi-Study Data Analysis


for optimal performance, especially in a multi-instrument envi-
ronment. Decisions such as the amount of sample injected (2 μg) Ultimately, ProCan’s goal is a highly curated collection of sample-
and the choice of a 90 min LC gradient, along with pre-emptive related data, proteomics, and other ‘omic data to support global
maintenance help limit the downtime of the facility. A challenge analyses. There remain significant open questions regarding the
in ensuring this over the long-term is dependency on a highly synthesis of these disparate data sources into a single cohe-
skilled scientific workforce who can prepare samples as well as sive analytics platform. We anticipate many challenges, ranging
operate and maintain the equipment. A firm commitment to de- from the essentially technical issue of merging spectral libraries
velop this skill-base is critical to ensuring that proteomics can across cohorts, through complex normalization and batch correc-
make a sustainable contribution to clinical practice. tion, to the integrative analysis of multi-omic data with clinical
outcomes.
ProCan data will provide a valuable resource beyond their
1.4. Single Study–Based Data Analysis primary use, once they become accessible to collaborators and
the wider scientific community. To facilitate this, the curated
Proteomics faces the general ‘omics challenges of high dimen- data must sit within a FAIR framework,[13] that is, Findable by
sionality and sparsity of data that comes from generating a large both humans and machines, Accessible using standard proto-
number of measurements from a relatively small number of cols, Interoperable with other systems and data resources, and
samples. It is also subject to the universal truths that study Reusable and reproducible via richly described metadata. This
limitations can be overcome by the availability of independent multitude of data exists within a complex taxonomy that must
validation datasets of appropriate size, and that there is no be contextualized with biological knowledge, either garnered
statistical correction that can truly compensate for the absence from existing public databases or from software applications.
of such data. To make progress through the early period of Traditional relational databases, although widely adopted across
collecting data resources, data scientists need to be alert to all disciplines, can limit exploration of highly connected data
potential confounders and choose between a wide range of such as these; however, graph-structured database technology—
strategies for data pre-processing, including batch correction, industrialized in fields such as fraud detection, national security,
data normalization, handling missing data values, and selecting and social networks—has been shown to facilitate this in biolog-
informative features. Each approach has strengths and weak- ical research.[14]
nesses, and while some guidance comes from experience with An advantage of SWATH/DIA is that raw data files can form a
other cancer ‘omic data, there are issues of particular relevance to permanent digital map available for re-analysis through compu-
proteomics. tational pipelines as analytical strategies improve, without need-
SWATH/DIA proteomics initially employs a peptide-centric ing to re-run the physical samples. Data commons–like infras-
scoring strategy to generate a peptide matrix, followed by infer- tructure enables this across tens or even hundreds of thousands
ence of proteins.[11] These two steps each involve complex statisti- of samples, ensuring that the current “release” of processed data
cal tests and stringent multiple-testing correction methods. Pep- reflects the state-of-the-art. Within the commons, data must be
tide “roll-up” to proteins has the effect of substantially reducing classified for access at various levels from fully public to tightly
the size of the dataset, both by absorbing a number of peptides controlled, depending on the “trust level” of the user and the sen-
into an individual protein assignation and by rejecting a volume sitivity of the data. Importantly, such a system must adhere to
of peptide data that cannot be confidently assigned to a protein. relevant jurisdictional privacy and ethics regulations.
This becomes a form of a priori data filtering; however, it is cur- For cancer clinicians and researchers to develop confi-
rently unclear whether useful information is lost or false signal is dence in high-throughput proteomics, individual studies must
introduced during this process. The proteome is extremely com- demonstrate intrinsic robustness and consistency between
plex, with multiple proteoforms corresponding to a single gene laboratories and over time. The international proteomics com-
sequence. This complexity may be better reflected in the peptide munity has acknowledged this by promulgating a commitment
rather than a protein matrix, and it is possible that class discovery to transparency and quality in proteomics, chiefly through
analyses based on the more finely granular peptide-level data sharing data, detailed documentation of study metadata, ref-
will have advantages. To fully explore this, the performance erence datasets, and reagents.[15] These efforts are supported
of both peptide and protein-level classifiers will need to be by national and international consortia such as the Clinical
tested. Proteomic Tumor Analysis Consortium and the International
A challenging feature of proteomic data is the high rate of Cancer Proteogenomics Consortium (of which ProCan is a
missing values. It is likely that a proportion of this is non-random member) that were given impetus in 2016 by the U.S. Cancer
and related to threshold detection of low abundance peptides Moonshot initiative’s focus on proteogenomics.[16] Standards
or real differences in protein expression between tissue sample set by journal editors have an important role here, with a
types, reflecting potentially informative biological differences. A recent guideline for submission of DIA-MS studies a useful
number of strategies to manage missing values have been pro- exemplar.[17] The contribution of large-scale projects such as
posed, including filtering out features with missing data and im- ProCan to this endeavor will include a commitment to develop
puting missing data values. The development of classification and test a system of standard operating procedures and critical
models that are robust to missing values is of particular interest, evaluation of quality, both internally and through peer review
and there is some evidence that these may have advantages.[12] mechanisms.

Proteomics 2019, 1900109 1900109 (4 of 5) 


C 2019 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.proteomics-journal.com

2. Concluding Comments [3] P. Janiaud, S. Serghiou, J. P. A. Ioannidis, Cancer Treat. Rev. 2019, 73,
20.
Sustained effort in cancer tissue–based proteomics research [4] J. Marquart, E. Y. Chen, V. Prasad, JAMA Oncol. 2018, 4, 1093.
is expected to provide substantial clinical benefit. Success will [5] G. T. Gibney, L. M. Weiner, M. B. Atkins, Lancet Oncol. 2016, 17,
require input from a broad range of stakeholders, including e542.
cancer consumers, cancer biologists, oncologists, patholo- [6] C. Bais, B. Mueller, M. F. Brady, R. S. Mannel, R. A. Burger, W. Wei,
gists, proteomicists, software engineers, data scientists, health K. M. Marien, M. M. Kockx, A. Husain, M. J. Birrer, N. R. G. On-
cology/Gynecologic Oncology Group, J. Natl. Cancer Inst. 2017, 109,
economists, regulators, and research funding agencies. Our
djx066.
experience to date with ProCan indicates that a high level [7] O. J. Gustafsson, G. Arentz, P. Hoffmann, Biochim. Biophys. Acta
of community support exists, and that clinical application of 2015, 1854, 559.
high-throughput proteomics will be achievable. [8] K. Yoshihara, M. Shahmoradgoli, E. Martinez, R. Vegesna, H. Kim,
W. Torres-Garcia, V. Trevino, H. Shen, P. W. Laird, D. A. Levine, S.
L. Carter, G. Getz, K. Stemke-Hale, G. B. Mills, R. G. Verhaak, Nat.
Acknowledgements Commun. 2013, 4, 2612.
[9] T. Guo, P. Kouvonen, C. C. Koh, L. C. Gillet, W. E. Wolski, H. L. Rost,
ProCan is supported by the Australian Cancer Research Foundation, the G. Rosenberger, B. C. Collins, L. C. Blum, S. Gillessen, M. Joerger, W.
Cancer Institute New South Wales (NSW) (2017/TPG001, REG171150), Jochum, R. Aebersold, Nat. Med. 2015, 21, 407.
the NSW Ministry of Health (CMP-01), the University of Sydney, the [10] N. Lucas, A. B. Robinson, M. Marcker Espersen, S. Mahboob, D.
National Breast Cancer Foundation (IIRS-18-164), the Cancer Council
Xavier, J. Xue, R. L. Balleine, A. deFazio, P. G. Hains, P. J. Robinson,
NSW (IG 18-01), Ian Potter Foundation, the Commonwealth of Australia
J. Proteome. Res. 2019, 18, 399.
through the Medical Research Futures Fund (MRFF-PD), and the Na-
[11] C. Ludwig, L. Gillet, G. Rosenberger, S. Amon, B. C. Collins, R. Aeber-
tional Health and Medical Research Council of Australia (GNT1047070,
GNT1170739). sold, Mol. Sys. Biol. 2018, 14, e8126.
[12] M. Ali, S. A. Khan, K. Wennerberg, T. Aittokallio, Bioinformatics 2018,
34, 1353.
[13] M. D. Wilkinson, M. Dumontier, I. J. Aalbersberg, G. Appleton, M.
Conflict of Interest Axton, A. Baak, N. Blomberg, J. W. Boiten, L. B. da Silva Santos, P. E.
The authors declare no conflict of interest. Bourne, J. Bouwman, A. J. Brookes, T. Clark, M. Crosas, I. Dillo, O.
Dumon, S. Edmunds, C. T. Evelo, R. Finkers, A. Gonzalez-Beltran, A.
J. Gray, P. Groth, C. Goble, J. S. Grethe, J. Heringa, P. A. t Hoen, R.
Hooft, T. Kuhn, R. Kok, J. Kok, et al., Sci. Data 2016, 3, 160018.
Keywords [14] B. H. Yoon, S. K. Kim, S. Y. Kim, Genom. Informat. 2017, 15, 19.
cancer, data analysis, data-independent acquisition, proteomics, sequen- [15] C. R. Kinsinger, J. Apffel, M. Baker, X. Bian, C. H. Borchers, R. Brad-
tial window acquisition of all theoretical mass spectra–mass spectrometry shaw, M. Y. Brusniak, D. W. Chan, E. W. Deutsch, B. Domon, J. Gor-
man, R. Grimm, W. Hancock, H. Hermjakob, D. Horn, C. Hunter,
P. Kolar, H. J. Kraus, H. Langen, R. Linding, R. L. Moritz, G. S.
Received: April 30, 2019
Omenn, R. Orlando, A. Pandey, P. Ping, A. Rahbar, R. Rivers, S. L.
Revised: July 11, 2019
Published online: Seymour, R. J. Simpson, D. Slotta, et al., Mol. Cell. Proteom. 2011, 10,
O111.015446.
[16] C. R. Jimenez, H. Zhang, C. R. Kinsinger, E. C. Nice, Clin. Proteom.
2018, 15, 4.
[1] S. Loibl, L. Gianni, Lancet 2017, 389, 2415. [17] R. J. Chalkley, M. J. MacCoss, J. D. Jaffe, H. L. Rost, Mol. Cell. Proteom.
[2] S. A. Cook, A. V. Tinker, BioDrugs 2019, 33, 255. PMID: 30895466. 2019, 18, 1.

Proteomics 2019, 1900109 1900109 (5 of 5) 


C 2019 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

You might also like