Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

ll

OPEN ACCESS

Commentary
Artificial intelligence in breast cancer diagnostics
Caterina AM. La Porta1,2,* and Stefano Zapperi3,4
1Department of Environmental Science and Policy, Center for Complexity & Biosystems, University of Milan, via Celoria 10, 20133 Milan, Italy
2CNR - Consiglio Nazionale delle Ricerche, Istituto di Biofisica, via Celoria 10, 20133 Milan, Italy
3Department of Physics, Center for Complexity & Biosystems, University of Milan, via Celoria 16, 20133 Milan, Italy
4CNR - Consiglio Nazionale delle Ricerche, Istituto di Chimica della Materia Condensata e di Tecnologie per l’Energia, Via R. Cozzi 53, 20125

Milano, Italy
*Correspondence: caterina.laporta@unimi.it
https://doi.org/10.1016/j.xcrm.2022.100851

Since breast cancer deaths are mainly due to metastasis, predicting the risk that a primary tumor will develop
metastasis after a first diagnosis is a central issue that could be addressed by artificial intelligence. To over-
come the problem posed by limited availability of standardized datasets, algorithms should include biolog-
ical insight.

The risk of metastasis in breast cell subpopulations that will remain in some cases, but also extremely expen-
cancer dormant, leading to drug resistance. An sive. Clearly it is very important to identify
Cancer is one of the leading causes of additional mechanism is provided by se- in advance the patients that are most
death in the western world and it is nescent cells that can contribute to keep likely to respond. In this respect, artificial
increasing in the developing world. Within the cells viable for an extended period ex- intelligence (AI) holds great promise to
cancers, breast cancer takes a terrible toll hibiting a senescence-associated secre- reach the goal of stratifying breast cancer
on women, with 500,000 deaths reported tory phenotype, characterized by the patients according to the aggressiveness
each year. Nowadays, people do not die release of many factors including proin- of their specific tumor, their individual risk
from the primary tumor but instead die flammatory cytokines and chemokines, of metastasis, and their likelihood to
from secondary tumors called metastasis, highlighting again the complex interaction respond to a given therapy.
which account for 90% of tumor mortality. between cells and environment. Further-
According to World Health Organization more, senescent cells can revert their Can AI predict the future?
(WHO) statistics, about one-third of these phenotype, resuming their growth.4 We can identify two main pathways for the
cancer fatalities could have been avoided Finally, since cell migration is a key aspect application of AI to breast cancer diagnos-
through earlier detection and treatment. A of aggressiveness, plasticity of cancer in- tics, the first relying on image analysis and
critical barrier to develop effective drugs vasion and metastasis depends on the the second on molecular data. State-of-
to treat cancer metastasis is the high het- ability of cancer cells to switch between the-art deep learning algorithms, when
erogeneity of tumor cells implying that collective and single-cell dissemination trained with large datasets of annotated
each cell of a specific tumor is slightly through the regulation of cadherin-medi- images, enable very precise image classi-
different from the others.1 Plasticity is ated cell-cell junctions.5 fication and can easily be deployed on his-
one of the emerging properties of tumor The strong heterogeneity and plasticity tological images. In the context of breast
cells that helps them to escape from a of breast cancer represents a serious cancer, deep learning methods are able
drug’s effects, leading to the develop- issue for an effective treatment, since to reliably assess whether a histological
ment of drug resistance. We discussed most currently available drugs are de- image is referring to a normal tissue, a
this point in a recent book focused on signed to target general biological as- benign tumor, in situ carcinoma, or an inva-
phenotypic switching where we high- pects of the tumors, without considering sive carcinoma.6 While these tools show
lighted three important issues2: (1) the the specificity of each tumor in each pa- great promise in assisting the pathologist
impact of the environment on the plas- tient. Predicting the individual risk of in the diagnosis after biopsy, we are still
ticity of the tumor cells; (2) the correlation aggressiveness of primary breast cancer far from an accurate classification of the
between senescence and plasticity; and would allow physicians to choose the metastatic risk or to predict the likelihood
(3) the role of phenotypic switching in best therapeutic strategy, limiting over- to respond to a specific treatment. It is
inducing collective cell migration. Envi- treatment and side effects that are detri- not clear if it will ever be possible to extract
ronment-modulated cancer cell plasticity mental for the patient’s quality of life. such a fine-grained information from im-
was clearly shown in different types of tu- There is therefore a pressing need to ages alone, even if we increase the training
mors, including breast cancer.3 This im- develop predictive tools for personalized set by collecting more images. Tissue
plies the possibility to have dormant can- therapies that could be more sustainable morphology might not display enough fea-
cer cells if the surrounding environment and economically affordable. This aspect tures to enable a precise prediction of
permits it. On the other hand, treatment appears particularly urgent in the context clinical outcome and images might suffer
with specific drugs could help selecting of immunotherapy that is very effective from technical biases due to preparation

Cell Reports Medicine 3, 100851, December 20, 2022 ª 2022 The Author(s). 1
This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
ll
OPEN ACCESS Commentary

protocols that might vary among institu- Unfortunately, these kinds of brute force the protocol followed to collect the sam-
tions. Furthermore, in order to apply these algorithms suffer from serious problems ples and to extract the genetic material
methods, it will be necessary to build up mainly due to the relative shortage of data or the platform used to sequence it, can
large and accessible databases of digital available for training. A transcriptome con- have an important effect on gene expres-
images for training the algorithm. Efforts tains roughly Ng = 20,000 genes and each sion data, hiding the true biological vari-
along these lines are underway in many of them represents potentially a relevant ability of the dataset. If batch effects are
western countries but we are still far from feature to predict clinical outcome. The not removed by suitable algorithms,12 an
reaching this goal globally. number of samples (Ns) used to train the AI algorithm may classify the samples ac-
While there is a large literature investi- machine learning algorithm is, however, cording to their batch rather than their bio-
gating genomic mutations, we focus here typically much smaller than Ng, ranging logical characteristics, providing results
on gene expression data, which represent most of the times to a few hundred for that would be of little practical use.
(in our opinion) a more promising area of each breast cancer subtype. This problem Since adding different datasets coming
study. Since tumor cells are plastic, gene is well known as the ‘‘curse of dimension- from different studies is problematic and
expression data provide a detailed finger- ality.’’ When the dimensionality of the ob- the number of available samples Ns is often
print of the tumor at a particular moment jects under study increases, the available limited, an alternative strategy is to reduce
in time. One can thus conceive that the data become effectively sparse. Reliable the effective dimension Ng of the transcrip-
expression level of all the genes could results can often be obtained only with a tome by shifting the attention from genes to
encode important information on the training set that is much larger than the pathways. A pathway is a relatively small
phenotype of cancer cells which could be dimensionality of the object in order to set of genes working together for a given
exploited to make predictions. Molecular ensure that there are several samples for biological function. Since the number of
subtyping of breast cancer is already well each combination of gene values. In prac- genes in a pathway rarely exceeds Ng =
established and is based on the expres- tice, we would need to train a classifier us- 100, with a number of samples Ns > 100
sion of of estrogen receptor (ER), proges- ing tens of thousands of gene expression one can overcome the curse of dimension-
teron receptor (PR), the human epidermal data to obtain a reliable classification.10 ality. The idea was pioneered by Eytan
growth factor receptor 2 (HER2), and the A concrete and vivid example of the Domany and his group who introduced
proliferation marker Ki67. Combination of problems caused by the high dimension- pathway deregulation scores (PDS) as a
these factors leads to four standard ality of transcriptomic data is provided by method to identify which pathways are de-
subtypes: Luminal A (ER+ and/or PR+, earlier classification attempts of Luminal A regulated in individual breast cancer pa-
HER2 , Ki67low), Luminal B (ER+ and/or patients based on machine learning,7,8 tients.13 The method quantifies the overall
PR+, HER2 , Ki67high), HER2 positive which we mentioned above. As pointed deregulation of each pathway with respect
(HER2+), and triple negative (ER , PR , out by Drier et al.,11 the gene lists obtained to a reference sample by fitting a non-para-
HER2 ). While standard clinical guidelines by two independent studies using two inde- metric, non-linear one-dimensional prin-
are associated with each of these sub- pendent patient groups but similar machine cipal curve through the subspace of the
types, a large heterogeneity is present learning algorithms showed no overlap. transcriptome defined by the genes of
within each subtype. Due to the growing This observation calls into question the reli- the pathway. PDS can be computed for
availability of transcriptomic data for ability of the methodology used to establish all known pathways, providing a more
breast cancer, AI methods have increas- the gene lists in the first place. It turns out coarse-grained picture of the transcrip-
ingly been used to better stratify patients that the limited success that these methods tome of an individual. Clustering algorithms
within each molecular subtype. still achieve in stratifying the patient’s clin- where applied to the PDS scores of breast
Early studies applied machine learning ical outcome results from basic differences cancer patients reveal new patient classes
methods to the whole transcriptome with in the proliferation capabilities of the tu- with specific drug response and survival
the aim of identifying patients with higher mors. Proliferation correlates with the activ- statistics.12
risk of tumor relapse and low rate of sur- ity of many genes in the transcriptome and
vival.7,8 The studies focused on the therefore the activity of virtually any set of Guiding artificial intelligence with
Luminal A subtype and identified a list of genes can be used to stratify patients. In biological insight
genes that, according to the algorithms, the case of Luminal A, we do not need AI From our discussion, it should now be clear
best correlated with clinical outcome in to stratify patients—indeed, proliferation- that while AI methods based on artificial
the training set. The list was then used to related marker genes are commonly used neural networks are incredibly powerful in
establish a classifier that could be used to this end. Unfortunately, a similar strategy many domains, including medicine, their
to screen new patients after validation in is not applicable to the other breast cancer straightforward application to disentangle
a test set. In a similar spirit, a widespread subtypes where aggressiveness depends cancer heterogeneity faces important chal-
approach to stratify triple-negative breast on more than cell proliferation. lenges. To fully exploit the potential of AI in
cancer is based on K-means clustering of We should also mention that the issue providing reliable breast cancer patient
whole transcriptomic data, resulting in the of patient stratification is further compli- stratification strategies that can predict
establishment of 6 subgroups showing cated by the presence of ‘‘batch effects,’’ the individual response to a specific treat-
differential response to treatment but which prevent the straightforward merg- ment or the risk of metastasis and survival,
limited differences in terms of relapse- ing of datasets obtained in different ex- we would need extremely numerous and
free survival.9 periments. Experimental details, such as homogeneous data for training. Such data

2 Cell Reports Medicine 3, 100851, December 20, 2022


ll
Commentary OPEN ACCESS

are at present unfortunately not available. It ical insight. Out-of-the-box AI algorithms Vullings, M., Bakker, G.J., Starruß, J., et al.
is, however, still possible to make impor- are widely available and are extremely suc- (2020). Cell–cell adhesion and 3D matrix
confinement determine jamming transitions in
tant progress with the data we have, but cessful in many different areas where large-
breast cancer invasion. Nat. Cell Biol. 22,
we should move away from brute-force scale datasets for training are available. 1103–1115. https://doi.org/10.1038/s41556-020-
black-box type algorithms and exploit the Straightforward application of these algo- 0552-6.
large trove of biological knowledge accu- rithms to stratify breast cancer patients is 6. Aresta, G., Araújo, T., Kwok, S., Chennam-
mulated in the past decades to design hampered by the limited number of avail- setty, S.S., Safwan, M., Alex, V., Marami, B.,
smarter and more targeted algorithms for able transcriptomic data. On the other Prastawa, M., Chan, M., Donovan, M., et al.
patient stratification. hand, the deployment of deep learning (2019). BACH: Grand challenge on breast can-
We followed this strategy in recent years algorithms to histological images has cer histology images. Med. Image Anal. 56,
122–139. https://doi.org/10.1016/j.media.2019.
by developing ARIADNE, an algorithm to provided a promising diagnostic tool for
05.010.
stratify the aggressiveness of the tumor in early breast cancer detection that is
7. Van’t Veer, L.J., Dai, H., Van De Vijver, M.J.,
triple-negative breast cancer patients, likely to improve further thanks to the He, Y.D., Hart, A.A.M., Mao, M., Peterse,
based on their gene expression data.14 growing availability of images needed for H.L., Van Der Kooy, K., Marton, M.J., Wit-
The biological observation underlying the training. It is unclear, however, if histologi- teveen, A.T., et al. (2002). Gene expression
algorithm is related to the epithelial-mesen- cal images include enough information to profiling predicts clinical outcome of breast
chymal transition (EMT), which describes discriminate the future evolution of the tu- cancer. Nature 415, 530–536. https://doi.org/
10.1038/415530a.
how polarized epithelial (E) cells transform mor. Combining images with gene expres-
into mesenchymal (M) cells by down-regu- sion data might lead to interesting develop- 8. Wang, Y., Klijn, J.G., Zhang, Y., Sieuwerts,
A.M., Look, M.P., Yang, F., Talantov, D., Tim-
lating intracellular adhesion molecules and ments in the near future.
mermans, M., Meijer-van Gelder, M.E., Yu,
promoting cell polarity. EMT can some- J., et al. (2005). Gene-expression profiles to
times give rise to hybrid E/M cells that AUTHOR CONTRIBUTION predict distant metastasis of lymph-node-
display features of both E and M pheno- negative primary breast cancer. Lancet 365,
types, leading to collective invasive capa- Conceptualization, outlining, submitting, C.A.M.L.P., 671–679. https://doi.org/10.1016/s0140-6736(05)
S.Z.; first drafting, C.A.M.L.P., revising and editing, 17947-1.
bility and increased aggressiveness of the
C.A.M.L.P., S.Z.
tumor. The EMT is regulated by a complex 9. Lehmann, B.D., Bauer, J.A., Chen, X.,
Sanders, M.E., Chakravarthy, A.B., Shyr, Y.,
network involving several genes (Ng = 72),
DECLARATION OF INTEREST and Pietenpol, J.A. (2011). Identification of hu-
that we have recapitulated in silico by a
man triple-negative breast cancer subtypes
Boolean network model.15 The model pro- The authors declare the following competing inter- and preclinical models for selection of targeted
vides a landscape of all possible cell phe- ests: Complexdata S.R.L has filed an Italian patent therapies. J. Clinical Investigation 121, 2750–
notypes that can be used to as a reference application related to the present work. Inventors: 2767. https://doi.org/10.1172/jci45014.
map for gene expression data coming F. Font-Clos, S. Zapperi, C.A.M. La Porta. Patent
10. Ein-Dor, L., Zuk, O., and Domany, E. (2006).
status: granted. Date of application: 13/12/2019.
from individual breast cancer patients. Thousands of samples are needed to generate
Application number: 102,019,000,023,946. The
ARIADNE can perform the projection from a robust gene list for predicting outcome in
patent concerns a method to screen breast cancer
cancer. Proc. Natl. Acad. Sci. USA 103,
gene expression data to the landscape patients using transcriptomic data and Boolean
5923–5928. https://doi.org/10.1073/pnas.0601
and allows us to identify the patients whose networks. C.A.M.P.L. and S.Z. hold 14.72% and
231103.
tumor contain a signature of aggressive 7.36% shares of Complexdata S.R.L., respectively.
11. Drier, Y., and Domany, E. (2011). Do two ma-
hybrid states. Notice that these hybrid chine-learning based prognostic signatures
states could not be identified by measuring REFERENCES for breast cancer capture the same biological
the expression of a set of genes but only by processes? PLoS One 6, e17795.
considering interactions among genes 1. Dagogo-Jack, I., and Shaw, A.T. (2018). Tumour
12. Font-Clos, F., Zapperi, S., and La Porta, C.A.
heterogeneity and resistance to cancer thera-
within the gene regulatory network.15 (2017). Integrative analysis of pathway deregu-
pies. Nat. Rev. Clin. Oncol. 15, 81–94. https://
Cross validation with clinical data (with Ns lation in obesity. NPJ systems biology and ap-
doi.org/10.1038/nrclinonc.2017.166.
plications 3. 18–10. https://doi.org/10.1038/
> 500) confirmed that the high-risk triple-
2. La Porta, C.A., and Zapperi, S. (2020). Pheno- s41540-017-0018-z.
negative breast cancer patients identified typic Plasticity: The Emergence of Cancer 13. Drier, Y., Sheffer, M., and Domany, E. (2013).
by the ARIADNE algorithm indeed show a Stem Cells and Collective Cell Migration. In Pathway-based personalized analysis of
higher risk or relapse and low survival.14 Phenotypic Switching (Academic Press), cancer. Proc. Natl. Acad. Sci. USA 110,
While the algorithm has been validated pp. 639–649. 6388–6393. https://doi.org/10.1073/pnas.12196
with triple-negative breast cancer patients, 3. Wahl, G.M., and Spike, B.T. (2017). Cell state 51110.
the strategy is fully general and could plasticity, stem cells, EMT, and the generation 14. Font-Clos, F., Zapperi, S., and La Porta, C.A.
of intra-tumoral heterogeneity. NPJ breast
readily be extended to other breast cancer (2021). Classification of triple-negative breast
cancer 3, 1–13. cancers through a Boolean network model of
subtypes and potentially also other tumors.
4. La Porta, C.A.M., Zapperi, S., and Sethna, J.P. the epithelial-mesenchymal transition. Cell
(2012). Senescent cells in growing tumors: Systems 12, 457–462.
Conclusions population dynamics and cancer stem cells. 15. Font-Clos, F., Zapperi, S., and La Porta,
Our discussion of recent applications of AI PLoS Comput. Biol. 8, e1002316. https://doi. C.A.M. (2018). Topography of epithelial–
to breast cancer diagnostics suggests that org/10.1371/journal.pcbi.1002316. mesenchymal plasticity. Proc. Natl. Acad.
the most effective strategies use a combi- 5. Ilina, O., Gritsenko, P.G., Syga, S., Lippoldt, J., Sci. USA 115, 5902–5907. https://doi.org/10.
nation of algorithmic ingenuity and biolog- La Porta, C.A.M., Chepizhko, O., Grosser, S., 1073/pnas.1722609115.

Cell Reports Medicine 3, 100851, December 20, 2022 3

You might also like