2023 - Aradhya -

Received: 15 June 2023 Revised: 13 July 2023 Accepted: 19 July 2023
DOI: 10.1002/ajmg.c.32057
REVIEW ARTICLE
Applications of artificial intelligence in clinical laboratory

genomics
Swaroop Aradhya 1,2 | Flavia M. Facio 1 | Hillery Metz 1 | Toby Manders 1 |

1 1 1
Alexandre Colavin | Yuya Kobayashi | Keith Nykamp |
Britt Johnson 1 | Robert L. Nussbaum 1,3
1
Invitae Corporation, San Francisco,
California, USA Abstract
2
Adjunct Clinical Faculty, Department of The transition from analog to digital technologies in clinical laboratory genomics is
Pathology, Stanford University School of
Medicine, Stanford, California, USA ushering in an era of “big data” in ways that will exceed human capacity to rapidly
3
Volunteer Faculty, School of Medicine, and reproducibly analyze those data using conventional approaches. Accurately eval-
University of California San Francisco, San
uating complex molecular data to facilitate timely diagnosis and management of
Francisco, California, USA
genomic disorders will require supportive artificial intelligence methods. These are
Correspondence
already being introduced into clinical laboratory genomics to identify variants in DNA
Swaroop Aradhya, Invitae Corporation, San
Francisco, California, USA. sequencing data, predict the effects of DNA variants on protein structure and func-
Email: swaroop.aradhya@invitae.com tion to inform clinical interpretation of pathogenicity, link phenotype ontologies to
genetic variants identified through exome or genome sequencing to help clinicians
reach diagnostic answers faster, correlate genomic data with tumor staging and treat-
ment approaches, utilize natural language processing to identify critical published
medical literature during analysis of genomic data, and use interactive chatbots to
identify individuals who qualify for genetic testing or to provide pre-test and post-
test education. With careful and ethical development and validation of artificial intel-
ligence for clinical laboratory genomics, these advances are expected to significantly
enhance the abilities of geneticists to translate complex data into clearly synthesized
information for clinicians to use in managing the care of their patients at scale.
KEYWORDS
deep learning, germline genetic testing, in silico prediction algorithms, machine learning,
precision medicine, variant classification
1 | N E C E S S I T Y OF A R T I F I C I A L generation sequencing (NGS). Exome and genome sequencing—even

INTELLIGENCE IN GENOMICS multigene panels and chromosomal microarrays—have already been
performed for tens of millions of individuals around the world, and
Constrained over the past 50 years to the realms of microscopes, gel vast amounts of digitized data from millions of these individuals are
electrophoresis, and radiographs, clinical laboratory genomics is now available to those interested in mining the data for discoveries or for
dominated by digital data due to the rapid adoption of next- insights that can improve standards of practice in clinical laboratory
This is an open access article under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License, which permits use and distribution in any
medium, provided the original work is properly cited, the use is non-commercial and no modifications or adaptations are made.
© 2023 Invitae Corporation. American Journal of Medical Genetics Part C: Seminars in Medical Genetics published by Wiley Periodicals LLC.
Am J Med Genet. 2023;193C:e32057. wileyonlinelibrary.com/journal/ajmgc 1 of 15

https://doi.org/10.1002/ajmg.c.32057
15524876, 2023, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/ajmg.c.32057 by INASP/HINARI - PAKISTAN, Wiley Online Library on [05/07/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
2 of 15 ARADHYA ET AL.
genomics. These genomic data are essential in precision medicine,

where a variety of complex molecular data are combined with clinical AI
information to diagnose and treat individuals with disease (Rajpurkar Artificial intelligence
et al., 2022; Yu et al., 2018). However, even the amount of extractable
data in a single genome—with millions of sequence variants, many
ML
copy number variants, somatic mosaicism, and variable expression of
Machine learning
transcript isoforms—exceeds the capacity of human beings alone to
identify answers to specific questions about health and traits. One
solution is to incorporate advanced artificial intelligence (AI) into NLP
genomic analysis (Diao et al., 2018). Deep
Following the completion of the Human Genome Project, learning
genomes from multiple human populations and diverse primate and
non-primate species have been sequenced and deposited in public
databases (e.g., gnomAD), enabling important discoveries through
DNA sequence comparisons and assessments of allele frequencies in F I G U R E 1 Branches of artificial intelligence. Machine learning
different populations (Karczewski et al., 2020; Margulies & (ML) gives computers the ability to learn from examples without being
Birney, 2008). Alongside such advances in understanding DNA explicitly programmed. Deep learning is a subset of ML that uses
artificial neural networks and builds a hierarchy of data
sequence variation (genomics) over the past two decades, consider-
representations. Natural language processing (NLP) refers to a range
able progress has been made in characterizing protein sequence varia-
of computing techniques for interpreting and generating human
tion and function (proteomics) and RNA transcript isoforms and their language.
expression patterns (transcriptomics) (GTEx Consortium, 2015;
Jumper et al., 2021). Furthermore, the ClinVar database at the
National Institutes of Health (www.ncbi.nlm.nih.gov/clinvar/) has ML is an AI discipline that involves applying statistical models and
amassed more than two million DNA variants observed in individuals algorithms to analyze and draw inferences from data (Figure 1).
undergoing genetic testing for a broad variety of hereditary diseases, Within clinical laboratory genomics, ML techniques can be applied to
providing a rich dataset that has been used to improve and standard- tasks as diverse as predicting the effect of an altered amino acid resi-
ize clinical variant classification (Harrison et al., 2016). Collectively, due in a protein or identifying phenotypically similar groups of
these data are an enormous resource that can inform many aspects of patients. In simple terms, an ML-based computer algorithm is trained
biology and precision medicine, from discovery of gene-disease asso- using a dataset with well-characterized examples, such as known
ciations to insight into how variants in a given gene affect molecular pathogenic or benign DNA variants, to recognize specific patterns in
mechanisms of disease. These data are also useful in virtually every those examples. A trained ML system can then be used to predict pat-
step of the clinical genetic testing workflow, including effectively cap- terns from new data (e.g., quantitatively determining if a novel DNA
turing relevant target DNA from a patient sample, aligning NGS reads, variant appears to be more like a pathogenic variant or more like a
training and evaluating complex machine learning (ML) tools that can benign one). Broadly, ML techniques cover a spectrum that spans
analyze genomic variation, and providing evidence to support catego- from supervised methods—in which a true value or label, such as a
rization of individual DNA variants as disease-causing or benign. variant's pathogenicity, is known for each example used for training—
to unsupervised methods, in which no such labels are used. The ML
toolkit comprises a wide range of statistical and algorithmic
2 | ARTIFICIAL INTELLIGENCE METHODS approaches, each with its own tradeoffs with respect to model com-
I N L A B O R A T O R Y G E NO M I C S plexity, interpretability, computational and data requirements, and
ease of use. Classical techniques (e.g., logistic regression, linear regres-
Paralleling the dramatic evolution in scale and speed of DNA sequenc- sion, K-nearest neighbors, random forests, programmed decision
ing methods, AI has rapidly advanced since the term was coined and trees), neural networks (i.e., deep learning), and Bayesian methods
the field born at the Dartmouth Summer Research Project on Artificial (e.g., Gaussian processes, multilevel models) all have suitable applica-
Intelligence in 1956 (Artificial Intelligence (AI) Coined at Dartmouth, n. tions in clinical laboratory genomics.
d.). Broadly, AI encompasses the theory and development of intelli- NLP is the branch of AI that involves the application of computers
gent systems—systems that act rationally in response to their inputs to understanding, interpreting, and generating human language, often
(Russell & Norvig, 2021). Decades of nearly exponential growth in with deep learning (Figure 1). NLP methods have been used to
computing power (Moore's law) and the commensurate increase develop a variety of tools ranging from machine translation
in data availability have enabled both theoretical and applied achieve- (e.g., Google Translate) to voice assistants (e.g., Amazon Alexa) to large
ments in a variety of AI disciplines, including ML, natural language language models and chatbots (e.g., GPT-4, ChatGPT) (Box 1).
processing (NLP), and computer vision. As described in the next sec- Large language models represent some of the largest and most com-
tions, ML and NLP have particularly powerful applications in clinical plex ML models ever developed, with hundreds of billions of trainable
laboratory genomics. parameters and trillions of examples used for training. These types of
ARADHYA ET AL. 3 of 15
B O X 1 ChatGPT on use of AI in genetic medicine. Interacted 3 | SEQUENCING AND BIOINFORMATICS

February 20, 2023.
Question posed: “How can AI improve genetic testing?” The types and scale of sequencing data have dramatically changed
ChatGPT response: over the past decade, owing to rapid improvements in NGS chemistry
“Artificial intelligence (AI) has the potential to improve genetic testing in
and throughput. The massive outputs from NGS demand large com-
several ways. Here are a few examples:
Developing more accurate algorithms for interpreting genetic data: AI can puting power to process the data into readable formats for down-
be used to develop sophisticated algorithms that can accurately stream analyses. As an example, the predominant DNA sequencing
interpret genetic data. These algorithms can help identify genetic equipment vendor, Illumina, uses the DRAGEN™ Pipeline to acceler-
variants associated with diseases, assess an individual's risk of
ate data processing, relying in part on AI (DRAGEN Sets New Stan-
developing a disease, and provide personalized recommendations for
dard for Data Accuracy in PrecisionFDA Benchmark Data, n.d.). Once
treatment or preventive measures.
Identifying novel genetic variants: AI can be used to identify previously primary NGS data have been assessed for quality and completion,
unknown genetic variants that are associated with diseases. This can GATK HaplotypeCaller has been the benchmark tool for calling vari-
be achieved by analyzing large datasets of genetic information using ants from aligned sequence reads (DePristo et al., 2011). The critical
machine learning techniques.
step of accurately identifying variants is being further improved
Streamlining genetic testing workflows: AI can be used to automate many
of the steps involved in genetic testing, from sample preparation to through newer algorithms, some based on AI methods (Alharbi &
data analysis. This can help reduce the turnaround time for test results Rashid, 2022; Olson et al., 2023; Poplin et al., 2018). For instance, a
and increase the efficiency of the testing process. deep learning method (DeepVariant-AF) developed recently by Goo-
Enhancing genetic counseling: AI can be used to analyze a patient's
gle Health considers population allele frequencies from the 1000
genetic information and provide personalized counseling based on their
individual risk factors. This can help patients make informed decisions Genomes Project and appears to call variants more accurately than
about their health and well-being. prior methods (Chen, Kolesnikov, et al., 2023).
Facilitating the integration of genetic data into electronic health records We and others have also developed methods for detecting copy
(EHRs): AI can be used to help integrate genetic data into EHRs,
number variants from NGS, by applying logistic regression on data
making it easier for healthcare providers to access and utilize this
information in patient care. from known copy number samples to generate a mean model that
Overall, AI has the potential to revolutionize genetic testing and improve provides a baseline for calling actual copy number events in clinically
our ability to diagnose and treat diseases. However, it is important to tested samples (Hill & Unckless, 2019; Lincoln et al., 2021; Özden
ensure that AI-based solutions are developed and implemented in a
et al., 2022; Truty et al., 2019; Välipakka et al., 2020). Similarly, we
responsible and ethical manner, with appropriate safeguards in place to
protect patient privacy and ensure the accuracy and reliability of test use an AI approach to identify which variants from primary sequenc-
results.” ing analysis need confirmation through orthogonal methods (Lincoln
et al., 2019; unpublished data). In sum, AI is being used both to accel-
erate data processing and to improve the accuracy of variant calling,
models have important uses in clinical genomics, such as in text min- including for variant types that have historically been difficult to
ing and simple chatbots, and are expected to expand rapidly in scope detect from NGS data (e.g., copy number variants).
and utility.
Compared to the use of traditional methods, the application of AI
technologies represents a critical opportunity for dramatically improv- 4 | CLINICAL CLASSIFICATION OF
ing the scalability, accuracy, and utility of clinical genomics. Diverse AI SEQUENCE VARIANTS
approaches are already being applied in this realm (Ledgister
Hanchard et al., 2022)—to support the identification of rare diseases The significance of DNA variants observed in individuals undergoing
through analysis of facial gestalt in the clinic (Gurovich et al., 2019; clinical genetic testing for hereditary disease is determined through a
Hsieh et al., 2022; Myers et al., 2020; Porras et al., 2021), to non- variant classification process based on guidelines prescribed by the
invasively select embryos for preimplantation genetic testing or direct American College of Medical Genetics and Genomics (ACMG)
uterine transfer (Barnes et al., 2023; Dimitriadis et al., 2022), and to (Richards et al., 2015). This process uses a standardized approach for
provide information to individuals with genetic health concerns collating diverse types of evidence to classify variants into one of five
(Schmidlen et al., 2022; Smith et al., 2023). tiers: pathogenic, likely pathogenic, variant of uncertain significance
Several publications have described the use of AI in genomics in (VUS), likely benign, or benign. Types of evidence include the nature
general, and mostly within the context of basic research (Diao of the DNA sequence variant and its intragenic location, evolutionary
et al., 2018; Ledgister Hanchard et al., 2022; Libbrecht & Noble, 2015), conservation at the variant position, prevalence of the variant in a
but an article specifically on the applications of AI in clinical laboratory general population cohort, predicted effect of the variant on the
genomics has become both timely and necessary. Here we discuss new downstream transcribed mRNA or translated protein, presence of pro-
applications of AI within clinical laboratory genomics (Figure 2) and protein functional domains at or near the variant position, clinical pheno-
vide real-world data to illustrate their use and value. To preserve clarity type of the individual undergoing genetic testing, published reports of
in describing the various applications, we discuss use cases for AI rather that variant or other variants at the same position or nearby positions,
than the technical details of each AI method. and family history. How these types of evidence are interpreted and
Clinical genomics
Examples of supporting AI methods
testing process
Gia and other AI chatbots

Pre-test education and information gathering
Pre-test Collecting additional clinical information through NLP
Decision trees to identify individuals eligible for genetic testing
ACACCG Illumina DRAGEN pipeline

CGATGGC
GGCTTTA NGS Primary data processing to ensure that raw sequence data
GCACTAC
meet set parameters
< <
Custom lab-developed algorithms
</>
Lab-specifc Assess sequencing quality and identify variants
bioinformatics Detect specific classes of variants, e.g., indels, CNVs, expansions
Select specific variants for orthogonal confirmation
Invitae Apollo, Fabric GEM, and Illumina Emedgene

A Variant
C Mine published literature to support variant interpretation via NLP
A Invitae Moon: find links between clinical phenotypes and variants
C interpretation
Invitae EMP: use ML to evaluate evidence-generating models
Human-aided automated reporting to clinicians

Reporting Programmed decision trees for automated inclusion of variant details
and clinical summaries
Gia and other AI chatbots

Post-test Post-test education and information sharing
Collecting additional clinical information through NLP
F I G U R E 2 Points in clinical laboratory genomics where artificial intelligence methods can be applied. Examples of publicly available as well as
proprietary methods are listed. Gia, Apollo, Moon, and Invitae EMP are trademarks of Invitae; Fabric GEM is a trademark of Fabric Genomics;
DRAGEN and Emedgene are trademarks of Illumina. AI, artificial intelligence; CNV, copy number variant; EMP, evidence modeling platform; ML,
machine learning; NGS, next-generation sequencing; NLP, natural language processing.
weighed relative to each other determines the final clinical classifica- predict the functional consequences of protein sequence changes that
tion of the observed variant. result from missense DNA variants. This class of variants makes up
The types of evidence used for variant classification fall into two the largest proportion of those clinically classified as VUSs. We con-
highly correlated but conceptually distinct classes: evidence demon- ducted a study of the full spectrum of variants observed in clinical
strating a deleterious effect of a DNA variant on gene function and evi- genetic testing and found that 87% of VUSs were missense variants
dence demonstrating a detrimental effect of the variant on the health (submitted manuscript), underscoring the urgent need to improve
and well-being of an individual carrying that variant. Examples of a vari- functional prediction methods for this class of variants encountered in
ant's deleterious effect on protein function include its impact on pro- clinical laboratory genomics. A smaller proportion of VUSs comprise
tein structure and stability or on enzymatic, receptor, or channel variants that might have possible effects on splicing, protein synthesis,
activity as demonstrated through in silico, in vivo, or cellular models. In or gene expression (Figure 3).
contrast, examples of a variant's detrimental health effects include clini-
cal observations, such as segregation of a variant with disease in fami-
lies or presence of the variant in affected patients in different families, 4.1 | Predicting the effects of DNA variants on
as well as population effects over time such as degree of evolutionary protein structure and function
conservation of a residue in orthologous proteins among different spe-
cies or allele frequencies of the variant in various human populations. Clinical laboratories typically use a suite of in silico models designed
The greatest source of uncertainty in clinical genetic testing for to predict the consequences of DNA sequence variants, and these
hereditary disease today stems from our limited ability to accurately predictions represent evidence that contributes to classifying variants.
>1.6 million patients Unique VUSs predicted the structures of 100,000 proteins (Jumper et al., 2021).
This and similar advances raise an important question: How can such
Missense
Missense technologies be incorporated into clinical variant classification to
VUS 87%
87% accurately indicate the effects of DNA variants on downstream pro-
Other
only tein structure and function and, ultimately, on health? Some have
68%
32% 1% CNV explored whether evidence from the analysis of protein structures can
4% other
6% non-coding be used to help classify variants (Caswell et al., 2022). In a more elabo-
2% splicing/truncating rate approach, to support clinical classification of missense variants,
our group has developed methods that use AlphaFold output to pre-
F I G U R E 3 Variant types observed in a large clinical cohort dict the effects of amino acid substitutions on the molecular stability
referred for genetic testing. Among 1,689,845 individuals tested using
of resulting proteins (manuscript in preparation). These methods have
a diagnostic multi-gene panel for a variety of hereditary disorders,
been incorporated into Invitae's EMP, where the accuracy of the
692,227 (41%) had at least one VUS, and 535,385 (31.7%) had VUS-
only findings. Across all patients there were 475,284 unique VUSs, molecular instability scores are assessed and, if they meet a strict
87% of which were missense variants and 2% of which affected quality threshold, are converted into gene-specific calibrated evidence
splicing. that is used for classifying missense variants. This process for validat-
ing and integrating AI-based evidence into variant classification is
described in more detail in Figure 4.
Our group uses a collection of these evidence-generating models, all Another important piece of evidence that clinical laboratories use
of which are trained using ML and validated in ways that ensure high to interpret the significance of a DNA variant within a protein-coding
positive predictive values. Predictions from these models are incorpo- region is the degree of evolutionary conservation at the affected DNA
rated into variant classification only when a high performance (accu- sequence and its corresponding amino acid position. A high degree of
racy) threshold is met. These models together comprise Invitae's conservation at a specific amino acid position across many species, or
Evidence Modeling Platform (EMP) (Figure 4). even complete conservation between vertebrates and invertebrates,
In silico algorithms have long been available for predicting the suggests the necessity of that residue for normal protein structure or
effects of missense changes, including some that leverage ML function. Simple sequence alignments that help determine the degree
(e.g., PolyPhen-2, MutationTaster, CADD) (Garcia et al., 2022). How- of conservation among genomic sequences of different species have
ever, when these in silico algorithms are applied broadly (i.e., to predict been used to predict the consequences (if any) of novel DNA variants.
the effects of variants in all coding genes in the genome), their accuracy However, it was not until recently that sophisticated in silico tools
tends to be inconsistent. Factors contributing to this inconsistency could demonstrate high accuracy in those predictions. Among these
include the relative paucity of definitively classified variants in some tools, ML-based EVE (evolutionary model of variant effect) in particu-
genes, inadequate consideration of gene-specific molecular mechanisms lar has demonstrated strong performance in predicting the pathoge-
of disease, and challenges in ruling out circularity or double counting of nicity of a DNA variant (Figure 5a), as measured by concordance with
redundant data during AI training and evaluation (Ghosh et al., 2017; clinical classifications in the ClinVar database (Figure 5b) and
Grimm et al., 2015). As a result, the ACMG guidelines award little weight with experimental data (Frazer et al., 2021). Despite its improved
to outputs from each of the algorithms when used by itself, and instead proteome-wide performance, EVE still demonstrates variable accuracy
recommend the use of multiple algorithms to obtain consensus scores in predicting the effects of amino acid changes in certain proteins. To
(Richards et al., 2015; Variant Effect Predictors, n.d.). address this limitation, we use Invitae's EMP to systematically evalu-
Recent improvements in algorithms designed to predict the ate the output from EVE, gene-by-gene, and use it for variant interpre-
effects of missense variants, especially those developed as ensemble tation only when it meets the required quality threshold (Figure 5c).
predictors (e.g., REVEL), have overall led to higher accuracy (Pejaver Newly described in a publication in Science, Primate-AI3D is a
et al., 2022). However, their accuracy is still diminished in certain deep learning model that leverages natural variation in primates to
types of genes because they extrapolate from a limited training data- make inferences about the impact of DNA variants in humans (Gao
set to score variants in diverse genes that often do not resemble the et al., 2023). Built on the premise that protein-altering variants com-
training set. To address this limitation, as part of Invitae's EMP monly found in any non-human primate have been tolerated by natu-
(Figure 4) we have developed new ML-based algorithms using care- ral selection—and are thus likely benign in humans—Primate-AI3D
fully curated training sets of pathogenic or benign variants from select uses deep learning to map genetic variants onto 3D protein structures
genes, and these algorithms show high positive predictive values partially derived from AlphaFold (Jumper et al., 2021) to make predic-
when applied to evaluating new variants in those same genes (manu- tions about their pathogenicity. Prior to building Primate-AI3D, the
script in preparation). This gene-by-gene training and implementation authors showed that among missense variants that had clinical classi-
approach achieves both high performance and systematic incorpora- fications in the ClinVar database that were concordant between two
tion into the overall process of clinical variant classification. or more laboratory submitters and were also common in at least one
A quantum leap occurred recently when AlphaFold—a deep non-human primate, 99% were classified as benign, demonstrating
learning-based algorithm developed by DeepMind—accurately the validity of their model's premise. Notably, sequence data collected
(a)
Population frequency Biophysical Spatial Molecular & cellular Sequence
(population) (in silico) (indirect) (experimental) (in silico)
Single cells
Pathogenicity
Score
Allele frequency Molecular Residue
G Transcriptome
variants
(b)
Model generation Performance Standardization Clinical variant
and learning evaluation interpretation
Benign VUS Pathogenic
Variants
Input
data
Pathogenicity Score
(c)
Example AI models ACMG variant classification tiers
Benign LB VUS LP Pathogenic
Types of evidence
Population allele frequency models Population data
SpliceAI Variant type
NLP-mined clinical data Clinical observations
Cellular evidence models Experimental studies
EVE, Biophysical and spatial analyses Indirect, computational
F I G U R E 4 Evidence modeling platform (EMP). Invitae's EMP system is composed of validated machine learning models that evaluate data
and make predictions to support variant classification. Machine learning represents a subset of AI. (a) A suite of models assesses unique attributes
of a DNA variant and its downstream amino acid change. Each model predicts the pathogenicity or benignity of the variant, and this prediction is
included as evidence during variant classification. (b) A standardized process is applied to train the models on known gene-specific input data
(e.g., labels such as pathogenic or benign variants). Each model is validated by measuring its performance on a new set of known labels, and only
models that reach a high accuracy threshold—area under the receiver operating characteristic (AUROC) ≥0.80—are included within EMP to
generate predictions for novel variants. (c) A model's prediction serves as evidence that contributes, alongside other types of evidence, toward
the final variant classification. Each type of evidence provides different information about a variant's likelihood of being pathogenic or benign.
The shaded bars illustrate the degree to which data from each type of evidence are useful for classification. For example, population allele
frequency can rule out the possibility of a variant causing disease (e.g., if that allele is found at high frequency in the general population), but on its
own cannot indicate pathogenicity. ACMG, American College of Medical Genetics and Genomics; AI, artificial intelligence; EVE, evolutionary
model of variant effect; LB, likely benign; LP, likely pathogenic; NLP, natural language processing; VUS, variant of uncertain significance.
from just 809 individual non-human primates (233 species, some criti- us—though both methods illustrate the promise of applying AI-based
cally endangered) contained 20% more exome variants than all methods to sequence data from diverse species.
human data currently in gnomAD, illustrating the value of this biodi- Another valuable source of evidence totally distinct from in silico
versity. The authors then used this wealth of primate data and models comes from experimental assessments of the impact of
Primate-AI3D to predict the pathogenicity of all possible human mis- variants on cellular function in vitro. Over the past decade, high-
sense variants (>70 million variants). In validation tests, Primate-AI3D throughput cellular assays, collectively termed multiplex assays of var-
performed strongly at separating pathogenic and benign ClinVar- iant effect (MAVEs), have been developed to systematically
classified variants. Thus, while EVE leverages evolutionary conserva- characterize—on a very large scale—the impact of DNA variants on a
tion broadly across diverse animals, Primate-AI3D may be more pow- wide array of molecular functions, including protein–protein interac-
erful in part because it focuses on the taxa most closely related to tions (Araya et al., 2012), enzymatic activity (Romero et al., 2015),
(a) within a single experiment. As a result, MAVEs present a useful oppor-
Multiple seq. Predicted pathogenicity tunity to incorporate new, highly informative functional data into vari-
alignment of missense variants ant classification during clinical genetic testing for hereditary disease
(Esposito et al., 2019). Notwithstanding the importance and richness
EVE model Benign of this resource, it remains imperative that MAVE outputs are care-
Pathogenic fully evaluated gene-by-gene and experiment-by-experiment to
ensure that the data are suitable for variant classification. To that end,
again using Invitae's EMP, we examined 49 MAVE datasets from
22 publications and discovered that 42 were relatively poor at dis-
criminating between benign and pathogenic DNA variants (i.e., the
AUROC was <0.80). In contrast, MAVE data related to BRCA1, TP53,
(b) and several other genes were extremely useful for variant classifica-
ClinVar
(likely) pathogenic tion, because those data faithfully represented the effects of DNA
EVE Score
(likely) benign variants on gene function in ways that correlated strongly with patho-
genicity in individuals diagnosed with cancer (Findlay et al., 2018;
ClinVar VUS Giacomelli et al., 2018). These results underscore the need to consider
EVE pathogenic deleterious effects of variants on protein structure and function as
EVE benign contributing to, and not unilaterally, determining whether a variant is
EVE VUS
clinically pathogenic.
Position
(c)
4.2 | Predicting the effects of variants on gene
Number of genes
120
splicing
100 Genes with
80 low performance
60 In addition to methods that enhance our ability to interpret the clinical
40 significance of amino acid changes in protein sequences, other newly
20 developed methods have become useful for predicting the effect of
0
DNA variants on RNA splicing. Although variants that may alter RNA
0.2 0.4 0.6 0.8 1.0
splicing account for a small proportion of VUSs (Figure 3), these vari-
AUROC
ants are nevertheless responsible for a broad variety of hereditary dis-
F I G U R E 5 Using EVE (evolutionary model of variant effect) to eases and must be carefully evaluated during clinical genetic testing
predict functional effects of DNA variants within coding regions of (Kamps-Hughes et al., 2023; Karam et al., 2019). RNA sequencing is
genes. (a) The EVE algorithm considers evolutionary conservation of increasingly used alongside DNA sequencing to identify splice alter-
DNA sequences across diverse organisms and a deep learning model ations directly; however, transcripts for some genes are poorly
to determine the strength of evolutionary constraint at positions
expressed in available biological specimens and are therefore not
within coding regions of genes. Using this measure, it then predicts
whether protein sequence changes caused by specific missense amenable to this parallel testing approach. It is therefore critical that
variants would be tolerated (“benign”) or not tolerated (“pathogenic”). laboratory geneticists have access to robust algorithms that can pre-
(b) EVE produces continuous scores that are strongly concordant with dict the effects of DNA variants on splicing.
clinical classifications in the ClinVar database. As a result, these scores Algorithms to identify splice sites were developed years ago, and
may be useful for reclassifying variants of unknown significance
some were co-opted to predict how specific novel DNA variants might
(VUSs) into benign or pathogenic tiers. (c) The accuracy of EVE
impact splicing. Examples of these algorithms include MaxEntScan,
predictions of variant pathogenicity varies among genes offered for
clinical genetic testing. The level of accuracy, assessed through NNSplice, and GENSCAN. Similar to algorithms designed to predict the
Invitae's Evidence Modeling Platform, is measured for each gene effects of missense variants, most splice-predicting algorithms have
individually. EVE predictions are used for variant classification in shown moderate positive predictive values because they were trained
certain genes only when those predictions meet a high accuracy on limited datasets but then used to predict the potential splicing
threshold, defined as an area under the receiver operating
effects of variants in all genes (Jian et al., 2014; Ohno et al., 2018).
characteristic (AUROC) score of ≥0.80. In the example shown here of
EVE predictions of pathogenicity for variants in 495 genes, However, deep learning-based algorithms with superior performance,
performance exceeded the required threshold for a majority of genes. such as SpliceAI, are now available (Jaganathan et al., 2019). More
recently, the Pangolin deep learning model has been developed,
leveraging splicing data from closely related species to improve upon
regulatory control (Kwasnieski et al., 2012), and protein stability results from SpliceAI (Zeng & Li, 2022). We and others are working on
(Hasle et al., 2019). Unlike previous approaches for determining vari- using Pangolin to annotate exome and genome sequences in the next
ant effects, MAVEs enable the characterization of many DNA variants version of the gnomAD database (Karczewski et al., 2020).
>1 P/LP pediatric disorders. These structural variants can be detected using
~3,500 individuals G-banded karyotyping, the mainstay of constitutional cytogenetic
(1%) testing for nearly three decades, or high-resolution chromosomal
microarrays, which became the standard about a decade ago (Miller
et al., 2010). Because AI methods have been developed for visual pat-
VUS tern recognition in X-rays, computed tomography scans, and stained
~70,500 tissue slices, one might predict that these methods could also be
individuals
applied to the analysis of chromosomal karyotypes for constitutional
Individuals Individuals (20%)
rearrangements or to the analysis of tumor tissue for chromosomal
without ML with ML
rearrangements. To date, little to no AI appears to be used to rou-
evidence evidence >1 B/LB
tinely analyze chromosomal karyotypes for constitutional rearrange-
applied applied ~85,000
ments (Tseng et al., 2023), although various efforts have been used to
individuals
~192,000 ~159,000 (24%) decipher chromosomal rearrangements in cancer specimens, such as
(55%) (45%) from karyotyped hematologic malignancies (Bokhari et al., 2022; Cox
et al., 2022; Vajen et al., 2022; Walter et al., 2021). As genomic analy-
sis increasingly shifts toward molecular approaches, even for chromo-
Both P/LP & B/LB somal disorders, AI methods are being developed to identify
~340 individuals (<0.5%) chromosomal deletions, duplications, and other types of rearrange-
ments from NGS data directly (Lin et al., 2022; Popic et al., 2023).
F I G U R E 6 Machine learning-based evidence-generating models
contribute to reducing variants of uncertain significance in an
appreciable fraction of individuals undergoing genetic testing. B/LB,
benign/likely benign; ML, machine learning; P/LP, pathogenic/likely 6 | M I N I N G P U B L I S H E D LI T E R A T U R E OR
pathogenic; VUS, variant of uncertain significance. ELECTRONIC HEALTH RECORDS
NLP has become ubiquitous in modern society, including in genomic med-

4.3 | Impact of ML-based evidence modeling on icine (Simmons et al., 2016). Important uses, such as mining published lit-
providing definitive clinical results erature or electronic health records for clinical phenotype information,
are expected to improve genetics research, innovation, and clinical diag-
A genetic testing result of a VUS is not clinically actionable. Although in nostics (Birgmeier et al., 2020; Luo et al., 2021; Son et al., 2018). One
silico methods for predicting the effects of DNA sequence variants powerful application is within the context of rapid whole genome
have improved, their ability to help laboratory geneticists reduce the sequencing in the neonatal intensive care unit for children with suspected
number of VUSs reported and resolve uncertainty for patients has not genetic disease. AI methods, including NLP-based text mining, have facili-
been directly assessed. Therefore, to empirically examine the magni- tated rapid identification of the molecular causes of disease in newborns
tude of the impact of Invitae's EMP system (Figure 4) on providing (Clark et al., 2019; Owen et al., 2022; Peterson et al., 2023), significantly
definitive answers to individuals referred for clinical genetic testing, we reducing the time to diagnosis and the associated costs. Because of the
evaluated data from 350,695 patients who underwent diagnostic multi- advantages of such AI-based approaches, they could conceivably be
gene panel testing over a six-month period for a variety of hereditary applied to screening of newborns to identify those at risk for severe dis-
diseases. At least one evidence-generating model was available for ease (Kingsmore et al., 2022). Similarly, AI-based approaches for virtual
3087 genes, and 159,000 patients (45%) received results that screening for cardiovascular conditions, including familial hypercholester-
included EMP evidence for at least one reported variant (Figure 6). The olemia, have been explored by mining electronic health records for phe-
application of these models contributed to the classification of at least notypic information suggestive of elevated risk of disease (Pina
one B/LB variant in 85,000 individuals (24%), one P/LP variant in et al., 2020; Safarova et al., 2016; Sheth et al., 2021). Finally, clinical labo-
3500 individuals (1%), and both P/LP and B/LB variants in 340 indi- ratories use NLP to rapidly identify relevant evidence (e.g., data on geno-
viduals (0.1%) (Figure 6). Among individuals who had at least one vari- types or clinical phenotypes) from the published literature that helps to
ant with EMP evidence applied toward its interpretation, more than classify new variants observed in patients or resolve variants of uncertain
50% resulted in definitive (B/LB or P/LP) classifications. significance. A proliferation of such examples of applied NLP in genomic
medicine should be expected in the coming years.
5 | DECIPHERING CHROMOSOMAL
STRUCTURAL VARIANTS 7 | C O R R E LA T I N G G E N O T Y P E S A N D
P H EN O T Y P E S F O R CL I N I C A L D I A G NO S E S
Although most DNA changes associated with genetic disease are
sequence variants, large deletions and duplications within chromo- Diagnostic rates in hereditary disease, particularly involving neurode-
somes are also frequently pathogenic, particularly in prenatal and velopmental disability, have improved substantially in recent years
thanks to AI-based approaches. Clinical whole exome sequencing (Heald et al., 2021; Nazareth, Nussbaum, et al., 2021), for exchanging
(WES) or whole genome sequencing require robust tools to sift information during the informed consent process (Schmidlen
through the thousands of variants typically found in an individual and et al., 2019), for sharing information between patients and their rela-
identify the one or few variants responsible for that individual's clini- tives, healthcare providers, and others (Chavez-Yenter et al., 2021);
cal condition. Roughly a dozen years ago, clinical analysis of WES and for providing education before and after genetic testing (Chavez-
relied primarily on manual analysis due to a paucity of powerful soft- Yenter et al., 2021). For instance, a recent publication (Kurian
ware tools that could pull together diverse evidence types including et al., 2023) describes a disappointingly low rate of germline genetic
genotype-disease annotations, relevant published medical literature, testing (only 6.8%) among individuals with a cancer diagnosis, despite
genome sequence annotation resources, and predictions from in silico existing guidelines that recommend testing for such individuals. The
modeling algorithms. Various sophisticated proprietary software plat- authors recommend AI-supported chatbots as one approach to
forms for analyzing WES have since been developed, many of which increasing rates of genetic testing in these patients. As chatbots
incorporate AI (e.g., Invitae's Moon™, Fabric GEM™, Illumina's become more sophisticated and natural (human-like), users may be
Emedgene™, FindZebra) (De La Vega et al., 2021; Dragusin unable to distinguish between chatbots and humans. Some have sug-
et al., 2013; Meng et al., 2023; O'Brien et al., 2022; Wright gested that patients may even prefer chatbots because of their inher-
et al., 2023). These platforms have become faster and more accurate ent ability to better modulate responses to a wide range of user needs
by incorporating improved genotype–phenotype annotations from and personalities, and because of the high quality of their interactions
the Human Phenotype Ontology, the Monarch Initiative, DisGeNET, with users (Ayers et al., 2023). While additional research is still needed
and other research efforts (Köhler et al., 2021; Pilehvar et al., 2022; to understand in detail how physicians may benefit from the use of AI
Piñero et al., 2020; Robinson et al., 2008, 2014; Shefchek chatbots within their clinical genetics workflows (e.g., Smith
et al., 2020); by refining the heuristics used to analyze WES et al., 2023), the breadth of applications for AI chatbots in healthcare
(by mimicking the analysis processes used by experienced clinical lab- will likely increase overall. As we described recently (Snir et al., 2021),
oratory geneticists); and by deploying NLP to mine published litera- the successful integration of genomics into all areas of medicine will
ture for phenotype and DNA variant information that could be in part require the use of chatbots and other software tools, many of
relevant to identifying the molecular cause of an individual's clinical which will be developed using AI.
condition. In addition, global sharing of clinical WES data among clini-
cal labs, clinicians, and researchers has accelerated the discovery of
disease genes, further fueling improvements in diagnostic rates 9 | C O N S I DE R A T I O N S F O R U S E O F A I I N
of WES (Boycott et al., 2022; Kirkpatrick et al., 2015; Osmond CLINICAL LABORATORY GENOMICS
et al., 2022).
The deployment of AI models for clinical laboratory genomics pre-
sents unique opportunities, whether those models are designed for
8 | P R E - T E S T A N D P O ST - T E S T classification and reporting of sequence variants, chromosomal
APPLICATIONS changes or, perhaps in the future, other molecular changes
(e.g., transcriptional, epigenetic, or proteome changes). There are sev-
As genomic information becomes more voluminous and complex— eral pitfalls, however, that need to be recognized. First, incorrect
with NGS panels, exome or genome sequencing, family variant testing, assumptions or inappropriate training sets used during the develop-
follow-up testing to resolve VUSs, and the need to confirm results in ment or validation of AI models could lead to spurious results (Grimm
certain circumstances (e.g., in non-invasive prenatal screening)—it is et al., 2015). For example, if models that contribute evidence toward
imperative that patients and clinicians have access to pre-test and classifying variants rely too heavily on determining whether a variant
post-test education and other information resources. A relatively new is deleterious at the molecular level without having complementary
type of resource that has demonstrated significant benefits in this evidence of pathogenicity (e.g., case reports, population frequency,
regard is the interactive AI-based chatbot (Nazareth, Hayward, evolutionary conservation), and these variants are submitted into Clin-
et al., 2021; Siglen et al., 2022). A chatbot can be built either with sim- Var, future researchers may be at risk of using AI-corrupted training
ple rules to respond to expected questions or, on a more sophisticated data to develop new AI models. This potential pitfall may be mitigated
level, as a generative large language model that can self-learn and by carefully recognizing which classifications in ClinVar are AI-
improve its performance over time. supported and ensuring the use of only high-quality inter-lab consen-
The arrival of ChatGPT has understandably created widespread sus submissions derived from multiple types of evidence.
excitement and, at the same time, trepidation for what it may eventu- A second challenge to the adoption of AI models is determining
ally mean for human participation in different work environments. In how much weight should be assigned to evidence generated from dif-
the healthcare context, after more carefully designed trials and valida- ferent models, and how that evidence should be considered in combi-
tions are completed, interactive tools like ChatGPT are expected to nation with other types of evidence during variant classification.
become ubiquitous, powerful tools for identifying individuals who Although some methods are being proposed to calibrate in silico
should consider genetic testing due to their elevated risk profiles models and determine the value of their outputs (Pejaver et al., 2022),
integrating the full spectrum of evidence generated from increasingly approaches to reduce disease burden (Hebbring, 2019; Linder
diverse AI-based models into variant classification will require a sys- et al., 2021; Movaghar et al., 2021; Robinson et al., 2018; Yang
tematic approach. One solution could be to develop a holistic and et al., 2022). In addition, large-scale population data from genome-
quantitative variant classification framework, for example by using wide association studies are increasingly being leveraged to identify
probabilistic Bayesian inference that allows systematic assignment of polygenic risk for disease, and AI-based methods are being used to
weight to each type of evidence and combinations of evidence to out- conduct these types of studies and develop risk scores (Nicholls
put an overall probability of pathogenicity. et al., 2020; Steinfeldt et al., 2022). Together, often with the support
A third challenge facing AI approaches to variant classification is of AI methods, these complementary advances in genomics and phe-
that the use of genomic datasets with underrepresentation of individ- nomics are expected to catalyze faster drug discovery for hereditary
uals of non-Northern European ancestry could perpetuate inadequa- diseases (Alves et al., 2022; Boniolo et al., 2021; Visibelli et al., 2023)
cies in the delivery of definitive variant classification for such groups. and, importantly, extend to groups historically underserved by clinical
For example, it is well documented that individuals with some genetic genomics.
ancestries receive more VUSs than others because of a paucity of Although this article has largely focused on AI applications in rela-
population frequency data in public databases such as gnomAD tion to hereditary disease, AI approaches for clinical laboratory geno-
(Appelbaum et al., 2022; Florentine et al., 2022). Because some mics in relation to precision oncology likely hold equal promise for
genetic ancestry groups are better represented in public databases, improving human health (Stenzinger et al., 2022). This topic is too vast
they are also better represented in the data used for training and vali- in scope to be covered here and deserves its own article. However, in
dating AI models for variant interpretation. With the development of brief, AI may be explored to investigate the complex landscape of
any AI model for clinical laboratory genomics, it is necessary to exer- somatic DNA variants detectable in tumors (i.e., through NGS-based
cise caution and monitor for potential reinforcement of genetic ances- cancer genome profiling). Applying powerful AI methods in this man-
try biases in AI-generated evidence. The deployment of AI models ner promises to uncover relationships between cancer risk or diagno-
that are informed by biological principles and are agnostic to genetic sis and multi-omic data derived from DNA sequencing, RNA
ancestry should help to correct such biases and lead to more equity in sequencing, epigenetic signatures of methylation, histology, radiology,
variant classification. and clinical observations (Espín-Pérez et al., 2022; Silvestri
et al., 2023; Sun et al., 2023). It also promises to help in monitoring
molecular residual disease at different points in an individual's cancer
1 0 | CO NC L U SIO NS A ND F U T U R E treatment journey (Chen, Zhang, et al., 2023) and in identifying novel
DIRECTIONS biomarkers detectable by liquid biopsy. All of these applications would
represent important advancements that are urgently needed in medi-
The number of known monogenic diseases has increased rapidly over cal oncology.
the past decade and is expected to climb further. Importantly, the AI methods are becoming increasingly and urgently necessary as
number of genetic diseases with available therapies and precision clin- the use of genomic medicine increases rapidly across the world. How-
ical management is also increasing (Bick et al., 2021). AI methods will ever, it is worth emphasizing that AI at present still has limited scope
continue to improve and be used to rapidly diagnose hereditary dis- of use in clinical laboratory genomics due to continuously evolving
eases, identify those who can benefit from available therapies, recog- genomic databases, a steady pace of discoveries of gene-disease rela-
nize those at risk of genetic disease, and fuel further discovery of tionships, and limited deployment for providing education and infor-
novel disease genes. mation (Luca et al., 2023). Expert clinicians and laboratory geneticists
AI-based resources will also become richer as genomics databases will therefore need to carefully oversee the emerging use of AI in
grow in both volume and quality. Some of this data accumulation will genomic medicine and help other professionals incorporate genomic
occur through basic research aimed at further characterizing the information appropriately for clinical management of their patients
structure of the human genome (Liao et al., 2023), and through (Solomon, 2022). It is also important to note the potential lost oppor-
insights into less understood elements such as promoters, enhancers, tunity costs of delaying the use of AI, since that would hinder diagno-
alternative transcripts, functional non-coding regions, methylated sis and treatment for millions of individuals who need conclusive
sites, and heterochromatin regions. Tens of millions of new exome genetic testing results, even as the data that could provide answers
and genome sequences are expected to soon be added to public data- are available and waiting to be analyzed. Because the number of AI
bases through international research efforts (Manolio et al., 2020). models and the areas of their potential use are expected to grow rap-
Importantly, these studies are aggregating genomic data from diverse idly, even exponentially, it is important to standardize how they are
populations around the world (Wang et al., 2022), thereby improving used in healthcare. Key guiding principles for the scientific and clinical
the representation of genetic ancestry groups and the accuracy of community need to be developed as genomic medicine evolves to
studies on genomic contributions to complex diseases and other clini- become universal across healthcare specialties (Badal et al., 2023;
cal questions. Complementing this, phenomics research with elec- Rajpurkar et al., 2022), and as the need for individuals to interact with
tronic health records can reveal important gene–disease associations their genomic information for different purposes through their life-
that may be translated into useful screening and diagnostic spans becomes a reality.
AUTHOR CONTRIBUTIONS Meseguer, M., Zhan, Q., Rosenwaks, Z., Elemento, O., Zaninovic, N., &
All authors have reviewed, discussed, and agreed to their individual Hajirasouliha, I. (2023). A non-invasive artificial intelligence approach
for the prediction of human blastocyst ploidy: A retrospective model
contributions to this manuscript. All authors contributed to conceptu-
development and validation study. The Lancet. Digital Health, 5(1),
alizing, writing, editing, or reviewing the manuscript. e28–e40. https://doi.org/10.1016/S2589-7500(22)00213-8
Bick, D., Bick, S. L., Dimmock, D. P., Fowler, T. A., Caulfield, M. J., &
ACKNOWLEDGEMEN TS Scott, R. H. (2021). An online compendium of treatable genetic disor-
ders. American Journal of Medical Genetics. Part C, Seminars in Medical
We thank Dr. Elaine Chen of Invitae for analyzing data for figures and
Genetics, 187(1), 48–54. https://doi.org/10.1002/ajmg.c.31874
Kerry Aradhya of Invitae for scientific editing. Birgmeier, J., Deisseroth, C. A., Hayward, L. E., Galhardo, L. M. T.,
Tierno, A. P., Jagadeesh, K. A., Stenson, P. D., Cooper, D. N.,
CONF LICT OF IN TE RE ST ST AT E MENT Bernstein, J. A., Haeussler, M., & Bejerano, G. (2020). AVADA: Toward
automated pathogenic variant evidence retrieval directly from the full-
All authors are employees and stockholders of Invitae.
text literature. Genetics in Medicine: Official Journal of the American Col-
lege of Medical Genetics, 22(2), 362–370. https://doi.org/10.1038/
DATA AVAI LAB ILITY S TATEMENT s41436-019-0643-6
The data that support the findings of this study are available from the Bokhari, Y., Alhareeri, A., Aljouie, A., Alkhaldi, A., Rashid, M., Alawad, M.,
Alhassnan, R., Samargandy, S., Panahi, A., Heidrich, W., & Arodz, T.
corresponding author upon reasonable request.
(2022). ChromoEnhancer: An artificial-intelligence-based tool to
enhance neoplastic karyograms as an aid for effective analysis. Cell,
ORCID 11(14), 2244. https://doi.org/10.3390/cells11142244
Swaroop Aradhya https://orcid.org/0000-0001-6219-2931 Boniolo, F., Dorigatti, E., Ohnmacht, A. J., Saur, D., Schubert, B., &
Flavia M. Facio https://orcid.org/0000-0002-7472-471X Menden, M. P. (2021). Artificial intelligence in early drug
discovery enabling precision medicine. Expert Opinion on Drug Dis-
Hillery Metz https://orcid.org/0000-0002-8246-3513
covery, 16(9), 991–1007. https://doi.org/10.1080/17460441.2021.
Alexandre Colavin https://orcid.org/0000-0002-4996-5378 1918096
Yuya Kobayashi https://orcid.org/0000-0002-8202-2101 Boycott, K. M., Azzariti, D. R., Hamosh, A., & Rehm, H. L. (2022). Seven
Keith Nykamp https://orcid.org/0000-0002-1389-5320 years since the launch of the matchmaker exchange: The evolution of
genomic matchmaking. Human Mutation, 43(6), 659–667. https://doi.
Britt Johnson https://orcid.org/0000-0003-1883-5201
org/10.1002/humu.24373
Robert L. Nussbaum https://orcid.org/0000-0003-3445-8880 Caswell, R. C., Gunning, A. C., Owens, M. M., Ellard, S., & Wright, C. F.
(2022). Assessing the clinical utility of protein structural analysis in
RE FE R ENC E S genomic variant classification: Experiences from a diagnostic labora-
tory. Genome Medicine, 14(1), 77. https://doi.org/10.1186/s13073-
Alharbi, W. S., & Rashid, M. (2022). A review of deep learning applications
022-01082-2
in human genomics using next-generation sequencing data. Human
Chavez-Yenter, D., Kimball, K. E., Kohlmann, W., Lorenz Chambers, R.,
Genomics, 16(1), 26. https://doi.org/10.1186/s40246-022-00396-x
Bradshaw, R. L., Espinel, W. F., Flynn, M., Gammon, A., Goldberg, E.,
Alves, V. M., Korn, D., Pervitsky, V., Thieme, A., Capuzzi, S. J., Baker, N.,
Hagerty, K. J., Hess, R., Kessler, C., Monahan, R., Temares, D.,
Chirkova, R., Ekins, S., Muratov, E. N., Hickey, A., & Tropsha, A. (2022).
Tobik, K., Mann, D. M., Kawamoto, K., Del Fiol, G., Buys, S. S., …
Knowledge-based approaches to drug discovery for rare diseases.
Kaphingst, K. A. (2021). Patient interactions with an automated con-
Drug Discovery Today, 27(2), 490–502. https://doi.org/10.1016/j.
versational agent delivering pretest genetics education: Descriptive
drudis.2021.10.014
study. Journal of Medical Internet Research, 23(11), e29447. https://
Appelbaum, P. S., Burke, W., Parens, E., Zeevi, D. A., Arbour, L.,
doi.org/10.2196/29447
Garrison, N. A., Bonham, V. L., & Chung, W. K. (2022). Is there a way
to reduce the inequity in variant interpretation on the basis of ances- Chen, G., Zhang, J., Fu, Q., Taly, V., & Tan, F. (2023). Integrative analysis of
try? American Journal of Human Genetics, 109(6), 981–988. https://doi. multi-omics data for liquid biopsy. British Journal of Cancer, 128(4),
org/10.1016/j.ajhg.2022.04.012 505–518. https://doi.org/10.1038/s41416-022-02048-2
Araya, C. L., Fowler, D. M., Chen, W., Muniez, I., Kelly, J. W., & Fields, S. Chen, N.-C., Kolesnikov, A., Goel, S., Yun, T., Chang, P.-C., & Carroll, A.
(2012). A fundamental protein property, thermodynamic stability, (2023). Improving variant calling using population data and deep learn-
revealed solely from large-scale measurements of protein function. Pro- ing. BMC Bioinformatics, 24(1), 197. https://doi.org/10.1186/s12859-
ceedings of the National Academy of Sciences of the United States of Amer- 023-05294-0
ica, 109(42), 16858–16863. https://doi.org/10.1073/pnas.1209751109 Clark, M. M., Hildreth, A., Batalov, S., Ding, Y., Chowdhury, S., Watkins, K.,
Artificial intelligence (AI) coined at Dartmouth. (n.d.). Retrieved May Ellsworth, K., Camp, B., Kint, C. I., Yacoubian, C., Farnaes, L.,
30, 2023, from: https://home.dartmouth.edu/about/artificial- Bainbridge, M. N., Beebe, C., Braun, J. J. A., Bray, M., Carroll, J.,
intelligence-ai-coined-dartmouth. Cakici, J. A., Caylor, S. A., Clarke, C., … Kingsmore, S. F. (2019). Diagno-
Ayers, J. W., Poliak, A., Dredze, M., Leas, E. C., Zhu, Z., Kelley, J. B., sis of genetic diseases in seriously ill children by rapid whole-genome
Faix, D. J., Goodman, A. M., Longhurst, C. A., Hogarth, M., & sequencing and automated phenotyping and interpretation. Science
Smith, D. M. (2023). Comparing physician and artificial intelligence Translational Medicine, 11(489), eaat6177. https://doi.org/10.1126/
Chatbot responses to patient questions posted to a public social media scitranslmed.aat6177
forum. JAMA Internal Medicine., 183, 589–596. https://doi.org/10. Cox, A., Park, C., Koduru, P., Wilson, K., Weinberg, O., Chen, W.,
1001/jamainternmed.2023.1838 García, R., & Kim, D. (2022). Automated classification of cytogenetic
Badal, K., Lee, C. M., & Esserman, L. J. (2023). Guiding principles for the abnormalities in hematolymphoid neoplasms. Bioinformatics, 38(5),
responsible development of artificial intelligence tools for healthcare. 1420–1426. https://doi.org/10.1093/bioinformatics/btab822
Communication & Medicine, 3(1), 47. https://doi.org/10.1038/s43856- De La Vega, F. M., Chowdhury, S., Moore, B., Frise, E., McCarthy, J.,
023-00279-9 Hernandez, E. J., Wong, T., James, K., Guidugli, L., Agrawal, P. B.,
Barnes, J., Brendel, M., Gao, V. R., Rajendran, S., Kim, J., Li, Q., Genetti, C. A., Brownstein, C. A., Beggs, A. H., Löscher, B.-S.,
Malmsten, J. E., Sierra, J. T., Zisimopoulos, P., Sigaras, A., Khosravi, P., Franke, A., Boone, B., Levy, S. E., Õunap, K., Pajusalu, S., …
Kingsmore, S. F. (2021). Artificial intelligence enables comprehensive Giacomelli, A. O., Yang, X., Lintner, R. E., McFarland, J. M., Duby, M.,
genome interpretation and nomination of candidate diagnoses for rare Kim, J., Howard, T. P., Takeda, D. Y., Ly, S. H., Kim, E., Gannon, H. S.,
genetic diseases. Genome Medicine, 13(1), 153. https://doi.org/10. Hurhula, B., Sharpe, T., Goodale, A., Fritchman, B., Steelman, S.,
1186/s13073-021-00965-0 Vazquez, F., Tsherniak, A., Aguirre, A. J., … Hahn, W. C. (2018). Muta-
DePristo, M. A., Banks, E., Poplin, R., Garimella, K. V., Maguire, J. R., tional processes shape the landscape of TP53 mutations in human
Hartl, C., Philippakis, A. A., del Angel, G., Rivas, M. A., Hanna, M., cancer. Nature Genetics, 50(10), 1381–1387. https://doi.org/10.1038/
McKenna, A., Fennell, T. J., Kernytsky, A. M., Sivachenko, A. Y., s41588-018-0204-y
Cibulskis, K., Gabriel, S. B., Altshuler, D., & Daly, M. J. (2011). A frame- Grimm, D. G., Azencott, C.-A., Aicheler, F., Gieraths, U., MacArthur, D. G.,
work for variation discovery and genotyping using next-generation Samocha, K. E., Cooper, D. N., Stenson, P. D., Daly, M. J.,
DNA sequencing data. Nature Genetics, 43(5), 491–498. https://doi. Smoller, J. W., Duncan, L. E., & Borgwardt, K. M. (2015). The evalua-
org/10.1038/ng.806 tion of tools used to predict the impact of missense variants is hin-
Diao, J. A., Kohane, I. S., & Manrai, A. K. (2018). Biomedical informatics dered by two types of circularity. Human Mutation, 36(5), 513–523.
and machine learning for clinical genomics. Human Molecular Genetics, https://doi.org/10.1002/humu.22768
27(R1), R29–R34. https://doi.org/10.1093/hmg/ddy088 GTEx Consortium. (2015). Human genomics. The genotype-tissue expres-
Dimitriadis, I., Zaninovic, N., Badiola, A. C., & Bormann, C. L. (2022). Artifi- sion (GTEx) pilot analysis: Multitissue gene regulation in humans. Sci-
cial intelligence in the embryology laboratory: A review. Reproductive ence, 348(6235), 648–660. https://doi.org/10.1126/science.1262110
Biomedicine Online, 44(3), 435–448. https://doi.org/10.1016/j.rbmo. Gurovich, Y., Hanani, Y., Bar, O., Nadav, G., Fleischer, N., Gelbman, D.,
2021.11.003 Basel-Salmon, L., Krawitz, P. M., Kamphausen, S. B., Zenker, M.,
DRAGEN sets new standard for data accuracy in PrecisionFDA benchmark Bird, L. M., & Gripp, K. W. (2019). Identifying facial phenotypes of
data. (n.d.). Retrieved May 30, 2023, from: https://www.illumina.com/ genetic disorders using deep learning. Nature Medicine, 25(1), 60–64.
science/genomics-research/articles/dragen-shines-again-precisionfda- https://doi.org/10.1038/s41591-018-0279-0
truth-challenge-v2.html. Harrison, S. M., Riggs, E. R., Maglott, D. R., Lee, J. M., Azzariti, D. R.,
Dragusin, R., Petcu, P., Lioma, C., Larsen, B., Jørgensen, H. L., Cox, I. J., Niehaus, A., Ramos, E. M., Martin, C. L., Landrum, M. J., & Rehm, H. L.
Hansen, L. K., Ingwersen, P., & Winther, O. (2013). FindZebra: A search (2016). Using ClinVar as a resource to support variant interpretation.
engine for rare diseases. International Journal of Medical Informatics, Current Protocols in Human Genetics, 89, 8.16.1–8.16.23. https://doi.
82(6), 528–538. https://doi.org/10.1016/j.ijmedinf.2013.01.005 org/10.1002/0471142905.hg0816s89
Espín-Pérez, A., Brennan, K., Ediriwickrema, A. S., Gevaert, O., Hasle, N., Matreyek, K. A., & Fowler, D. M. (2019). The impact of genetic
Lossos, I. S., & Gentles, A. J. (2022). Peripheral blood DNA methylation variants on PTEN molecular functions and cellular phenotypes. Cold
profiles predict future development of B-cell non-Hodgkin lymphoma. Spring Harbor Perspectives in Medicine, 9(11), a036228. https://doi.org/
NPJ Precision Oncology, 6(1), 53. https://doi.org/10.1038/s41698- 10.1101/cshperspect.a036228
022-00295-3 Heald, B., Keel, E., Marquard, J., Burke, C. A., Kalady, M. F., Church, J. M.,
Esposito, D., Weile, J., Shendure, J., Starita, L. M., Papenfuss, A. T., Liska, D., Mankaney, G., Hurley, K., & Eng, C. (2021). Using chatbots to
Roth, F. P., Fowler, D. M., & Rubin, A. F. (2019). MaveDB: An open- screen for heritable cancer syndromes in patients undergoing routine
source platform to distribute and interpret data from multiplexed colonoscopy. Journal of Medical Genetics, 58(12), 807–814. https://
assays of variant effect. Genome Biology, 20(1), 223. https://doi.org/ doi.org/10.1136/jmedgenet-2020-107294
10.1186/s13059-019-1845-6 Hebbring, S. (2019). Genomic and phenomic research in the 21st century.
Findlay, G. M., Daza, R. M., Martin, B., Zhang, M. D., Leith, A. P., Trends in Genetics, 35(1), 29–41. https://doi.org/10.1016/j.tig.2018.
Gasperini, M., Janizek, J. D., Huang, X., Starita, L. M., & Shendure, J. 09.007
(2018). Accurate classification of BRCA1 variants with saturation Hill, T., & Unckless, R. L. (2019). A deep learning approach for detecting
genome editing. Nature, 562(7726), 217–222. https://doi.org/10. copy number variation in next-generation sequencing data. G3, 9(11),
1038/s41586-018-0461-z 3575–3582. https://doi.org/10.1534/g3.119.400596
Florentine, M. M., Rouse, S. L., Stephans, J., Conrad, D., Czechowicz, J., Hsieh, T.-C., Bar-Haim, A., Moosa, S., Ehmke, N., Gripp, K. W., Pantel, J. T.,
Matthews, I. R., Meyer, A. K., Nadaraja, G. S., Parikh, R., Virbalas, J., Danyel, M., Mensah, M. A., Horn, D., Rosnev, S., Fleischer, N.,
Weinstein, J. E., & Chan, D. K. (2022). Racial and ethnic disparities in Bonini, G., Hustinx, A., Schmid, A., Knaus, A., Javanmardi, B.,
diagnostic efficacy of comprehensive genetic testing for sensorineural Klinkhammer, H., Lesmann, H., Sivalingam, S., … Krawitz, P. M. (2022).
hearing loss. Human Genetics, 141(3–4), 495–504. https://doi.org/10. GestaltMatcher facilitates rare disease matching using facial pheno-
1007/s00439-021-02338-4 type descriptors. Nature Genetics, 54(3), 349–357. https://doi.org/10.
Frazer, J., Notin, P., Dias, M., Gomez, A., Min, J. K., Brock, K., Gal, Y., & 1038/s41588-021-01010-x
Marks, D. S. (2021). Disease variant prediction with deep generative Jaganathan, K., Kyriazopoulou Panagiotopoulou, S., McRae, J. F.,
models of evolutionary data. Nature, 599(7883), 91–95. https://doi. Darbandi, S. F., Knowles, D., Li, Y. I., Kosmicki, J. A., Arbelaez, J.,
org/10.1038/s41586-021-04043-8 Cui, W., Schwartz, G. B., Chow, E. D., Kanterakis, E., Gao, H., Kia, A.,
Gao, H., Hamp, T., Ede, J., Schraiber, J. G., McRae, J., Singer-Berk, M., Batzoglou, S., Sanders, S. J., & Farh, K. K.-H. (2019). Predicting splicing
Yang, Y., Dietrich, A. S. D., Fiziev, P. P., Kuderna, L. F. K., Sundaram, L., from primary sequence with deep learning. Cell, 176(3), 535–548.e24.
Wu, Y., Adhikari, A., Field, Y., Chen, C., Batzoglou, S., Aguet, F., https://doi.org/10.1016/j.cell.2018.12.015
Lemire, G., Reimers, R., … Farh, K. K.-H. (2023). The landscape of toler- Jian, X., Boerwinkle, E., & Liu, X. (2014). In silico tools for splicing defect
ated genetic variation in humans and primates. Science, 380(6648), prediction: A survey from the viewpoint of end users. Genetics in Medi-
eabn8153. https://doi.org/10.1126/science.abn8197 cine: Official Journal of the American College of Medical Genetics, 16(7),
Garcia, F. A. d. O., de Andrade, E. S., & Palmero, E. I. (2022). Insights on 497–503. https://doi.org/10.1038/gim.2013.176
variant analysis in silico tools for pathogenicity prediction. Frontiers in Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O.,
Genetics, 13, 1010327. https://doi.org/10.3389/fgene.2022.1010327 Tunyasuvunakool, K., Bates, R., Žídek, A., Potapenko, A., Bridgland, A.,
Ghosh, R., Oak, N., & Plon, S. E. (2017). Evaluation of in silico algorithms Meyer, C., Kohl, S. A. A., Ballard, A. J., Cowie, A., Romera-Paredes, B.,
for use with ACMG/AMP clinical variant interpretation guidelines. Nikolov, S., Jain, R., Adler, J., … Hassabis, D. (2021). Highly accurate
Genome Biology, 18(1), 225. https://doi.org/10.1186/s13059-017- protein structure prediction with AlphaFold. Nature, 596(7873), 583–
1353-5 589. https://doi.org/10.1038/s41586-021-03819-2
Kamps-Hughes, N., Carlton, V. E. H., Fresard, L., Osazuwa, S., Starks, E., Lincoln, S. E., Hambuch, T., Zook, J. M., Bristow, S. L., Hatchell, K.,
Vincent, J. J., Albritton, S., Nussbaum, R. L., & Nykamp, K. (2023). A Truty, R., Kennemer, M., Shirts, B. H., Fellowes, A., Chowdhury, S.,
systematic method for detecting abnormal mRNA splicing and asses- Klee, E. W., Mahamdallie, S., Cleveland, M. H., Vallone, P. M., Ding, Y.,
sing its clinical impact in individuals undergoing genetic testing for Seal, S., DeSilva, W., Tomson, F. L., Huang, C., … Nussbaum, R. L.
hereditary cancer syndromes. The Journal of Molecular Diagnostics, (2021). One in seven pathogenic variants can be challenging to detect
25(3), 156–167. https://doi.org/10.1016/j.jmoldx.2022.12.002 by NGS: An analysis of 450,000 patients with implications for clinical
Karam, R., Conner, B., LaDuca, H., McGoldrick, K., Krempely, K., sensitivity and genetic test implementation. Genetics in Medicine: Offi-
Richardson, M. E., Zimmermann, H., Gutierrez, S., Reineke, P., cial Journal of the American College of Medical Genetics, 23(9), 1673–
Hoang, L., Allen, K., Yussuf, A., Farber-Katz, S., Rana, H. Q., Culver, S., 1680. https://doi.org/10.1038/s41436-021-01187-w
Lee, J., Nashed, S., Toppmeyer, D., Collins, D., … Chao, E. (2019). Lincoln, S. E., Truty, R., Lin, C.-F., Zook, J. M., Paul, J., Ramey, V. H.,
Assessment of diagnostic outcomes of RNA genetic testing for heredi- Salit, M., Rehm, H. L., Nussbaum, R. L., & Lebo, M. S. (2019). A rigorous
tary cancer. JAMA Network Open, 2(10), e1913900. https://doi.org/10. Interlaboratory examination of the need to confirm next-generation
1001/jamanetworkopen.2019.13900 sequencing-detected variants with an orthogonal method in clinical
Karczewski, K. J., Francioli, L. C., Tiao, G., Cummings, B. B., Alföldi, J., genetic testing. The Journal of Molecular Diagnostics, 21(2), 318–329.
Wang, Q., Collins, R. L., Laricchia, K. M., Ganna, A., Birnbaum, D. P., https://doi.org/10.1016/j.jmoldx.2018.10.009
Gauthier, L. D., Brand, H., Solomonson, M., Watts, N. A., Rhodes, D., Linder, J. E., Bastarache, L., Hughey, J. J., & Peterson, J. F. (2021). The role
Singer-Berk, M., England, E. M., Seaby, E. G., Kosmicki, J. A., … of electronic health records in advancing genomic medicine. Annual
MacArthur, D. G. (2020). The mutational constraint spectrum quanti- Review of Genomics and Human Genetics, 22, 219–238. https://doi.
fied from variation in 141,456 humans. Nature, 581(7809), 434–443. org/10.1146/annurev-genom-121120-125204
https://doi.org/10.1038/s41586-020-2308-7 Lin, J., Wang, S., Audano, P. A., Meng, D., Flores, J. I., Kosters, W., Yang, X.,
Kingsmore, S. F., Smith, L. D., Kunard, C. M., Bainbridge, M., Batalov, S., Jia, P., Marschall, T., Beck, C. R., & Ye, K. (2022). SVision: A deep learn-
Benson, W., Blincow, E., Caylor, S., Chambers, C., Del Angel, G., ing approach to resolve complex structural variants. Nature Methods,
Dimmock, D. P., Ding, Y., Ellsworth, K., Feigenbaum, A., Frise, E., 19(10), 1230–1233. https://doi.org/10.1038/s41592-022-01609-w
Green, R. C., Guidugli, L., Hall, K. P., Hansen, C., … Defay, T. (2022). A Luca, S., Clausen, M., Shaw, A., Lee, W., Krishnapillai, S., Adi-Wauran, E.,
genome sequencing system for universal newborn screening, diagno- Faghfoury, H., Costain, G., Jobling, R., Aronson, M., Liston, E., Silver, J.,
sis, and precision medicine for severe genetic diseases. American Jour- Shuman, C., Chad, L., Hayeems, R. Z., Bombard, Y., & Genetics Naviga-
nal of Human Genetics, 109(9), 1605–1619. https://doi.org/10.1016/j. tor Study Team. (2023). Finding the sweet spot: A qualitative study
ajhg.2022.08.003 exploring patients’ acceptability of chatbots in genetic service deliv-
Kirkpatrick, B. E., Riggs, E. R., Azzariti, D. R., Miller, V. R., Ledbetter, D. H., ery. Human Genetics, 142(3), 321–330. https://doi.org/10.1007/
Miller, D. T., Rehm, H., Martin, C. L., Faucett, W. A., & Resource, C. G. s00439-022-02512-2
(2015). GenomeConnect: Matchmaking between patients, clinical lab- Luo, L., Yan, S., Lai, P.-T., Veltri, D., Oler, A., Xirasagar, S., Ghosh, R.,
oratories, and researchers to improve genomic knowledge. Human Similuk, M., Robinson, P. N., & Lu, Z. (2021). PhenoTagger: A hybrid
Mutation, 36(10), 974–978. https://doi.org/10.1002/humu.22838 method for phenotype concept recognition using human phenotype
Köhler, S., Gargano, M., Matentzoglu, N., Carmody, L. C., Lewis-Smith, D., ontology. Bioinformatics, 37(13), 1884–1890. https://doi.org/10.
Vasilevsky, N. A., Danis, D., Balagura, G., Baynam, G., Brower, A. M., 1093/bioinformatics/btab019
Callahan, T. J., Chute, C. G., Est, J. L., Galer, P. D., Ganesan, S., Manolio, T. A., Goodhand, P., & Ginsburg, G. (2020). The international hun-
Griese, M., Haimel, M., Pazmandi, J., Hanauer, M., … Robinson, P. N. dred thousand plus cohort consortium: Integrating large-scale cohorts
(2021). The human phenotype ontology in 2021. Nucleic Acids to address global scientific challenges. The Lancet Digital Health, 2(11),
Research, 49(D1), D1207–D1217. https://doi.org/10.1093/nar/ e567–e568. https://doi.org/10.1016/S2589-7500(20)30242-9
gkaa1043 Margulies, E. H., & Birney, E. (2008). Approaches to comparative sequence
Kurian, A. W., Abrahamse, P., Furgal, A., Ward, K. C., Hamilton, A. S., analysis: Towards a functional view of vertebrate genomes. Nature
Hodan, R., Tocco, R., Liu, L., Berek, J. S., Hoang, L., Yussuf, A., Reviews Genetics, 9(4), 303–313. https://doi.org/10.1038/nrg2185
Susswein, L., Esplin, E. D., Slavin, T. P., Gomez, S. L., Hofer, T. P., & Meng, L., Attali, R., Talmy, T., Regev, Y., Mizrahi, N., Smirin-Yosef, P.,
Katz, S. J. (2023). Germline genetic testing after cancer diagnosis. Vossaert, L., Taborda, C., Santana, M., Machol, I., Xiao, R., Dai, H.,
JAMA: The Journal of the American Medical Association, 330, 43–51. Eng, C., Xia, F., & Tzur, S. (2023). Evaluation of an automated genome
https://doi.org/10.1001/jama.2023.9526 interpretation model for rare disease routinely used in a clinical
Kwasnieski, J. C., Mogno, I., Myers, C. A., Corbo, J. C., & Cohen, B. A. genetic laboratory. Genetics in Medicine: Official Journal of the American
(2012). Complex effects of nucleotide variants in a mammalian cis- College of Medical Genetics, 25(6), 100830. https://doi.org/10.1016/j.
regulatory element. Proceedings of the National Academy of Sciences of gim.2023.100830
the United States of America, 109(47), 19498–19503. https://doi.org/ Miller, D. T., Adam, M. P., Aradhya, S., Biesecker, L. G., Brothman, A. R.,
10.1073/pnas.1210678109 Carter, N. P., Church, D. M., Crolla, J. A., Eichler, E. E., Epstein, C. J.,
Ledgister Hanchard, S. E., Dwyer, M. C., Liu, S., Hu, P., Tekendo- Faucett, W. A., Feuk, L., Friedman, J. M., Hamosh, A., Jackson, L.,
Ngongang, C., Waikel, R. L., Duong, D., & Solomon, B. D. (2022). Scop- Kaminsky, E. B., Kok, K., Krantz, I. D., Kuhn, R. M., … Ledbetter, D. H.
ing review and classification of deep learning in medical genetics. Genet- (2010). Consensus statement: Chromosomal microarray is a first-tier
ics in Medicine: Official Journal of the American College of Medical clinical diagnostic test for individuals with developmental disabilities
Genetics, 24(8), 1593–1603. https://doi.org/10.1016/j.gim.2022.04.025 or congenital anomalies. American Journal of Human Genetics, 86(5),
Liao, W.-W., Asri, M., Ebler, J., Doerr, D., Haukness, M., Hickey, G., Lu, S., 749–764. https://doi.org/10.1016/j.ajhg.2010.04.006
Lucas, J. K., Monlong, J., Abel, H. J., Buonaiuto, S., Chang, X. H., Movaghar, A., Page, D., Scholze, D., Hong, J., DaWalt, L. S., Kuusisto, F.,
Cheng, H., Chu, J., Colonna, V., Eizenga, J. M., Feng, X., Fischer, C., Stewart, R., Brilliant, M., & Mailick, M. (2021). Artificial
Fulton, R. S., … Paten, B. (2023). A draft human pangenome reference. intelligence-assisted phenotype discovery of fragile X syndrome in a
Nature, 617(7960), 312–324. https://doi.org/10.1038/s41586-023- population-based sample. Genetics in Medicine: Official Journal of the
05896-x American College of Medical Genetics, 23(7), 1273–1280. https://doi.
Libbrecht, M. W., & Noble, W. S. (2015). Machine learning applications in org/10.1038/s41436-021-01144-7
genetics and genomics. Nature Reviews Genetics, 16(6), 321–332. Myers, L., Anderlid, B.-M., Nordgren, A., Lundin, K., Kuja-Halkola, R.,
https://doi.org/10.1038/nrg3920 Tammimies, K., & Bölte, S. (2020). Clinical versus automated
assessments of morphological variants in twins with and without neu- Piñero, J., Ramírez-Anguita, J. M., Saüch-Pitarch, J., Ronzano, F.,
rodevelopmental disorders. American Journal of Medical Genetics. Part Centeno, E., Sanz, F., & Furlong, L. I. (2020). The DisGeNET knowledge
A, 182(5), 1177–1189. https://doi.org/10.1002/ajmg.a.61545 platform for disease genomics: 2019 update. Nucleic Acids Research,
Nazareth, S., Hayward, L., Simmons, E., Snir, M., Hatchell, K. E., Rojahn, S., 48(D1), D845–D855. https://doi.org/10.1093/nar/gkz1021
Slotnick, R. N., & Nussbaum, R. L. (2021). Hereditary cancer risk using Popic, V., Rohlicek, C., Cunial, F., Hajirasouliha, I., Meleshko, D.,
a genetic chatbot before routine care visits. Obstetrics and Gynecology, Garimella, K., & Maheshwari, A. (2023). Cue: A deep-learning frame-
138(6), 860–870. https://doi.org/10.1097/AOG.0000000000004596 work for structural variant discovery and genotyping. Nature Methods,
Nazareth, S., Nussbaum, R. L., Siglen, E., & Wicklund, C. A. (2021). Chat- 20(4), 559–568. https://doi.org/10.1038/s41592-023-01799-x
bots & artificial intelligence to scale genetic information delivery. Journal Poplin, R., Chang, P.-C., Alexander, D., Schwartz, S., Colthurst, T., Ku, A.,
of Genetic Counseling, 30(1), 7–10. https://doi.org/10.1002/jgc4.1359 Newburger, D., Dijamco, J., Nguyen, N., Afshar, P. T., Gross, S. S.,
Nicholls, H. L., John, C. R., Watson, D. S., Munroe, P. B., Barnes, M. R., & Dorfman, L., McLean, C. Y., & DePristo, M. A. (2018). A universal SNP
Cabrera, C. P. (2020). Reaching the end-game for GWAS: Machine and small-indel variant caller using deep neural networks. Nature Bio-
learning approaches for the prioritization of complex disease loci. Fron- technology, 36(10), 983–987. https://doi.org/10.1038/nbt.4235
tiers in Genetics, 11, 350. https://doi.org/10.3389/fgene.2020.00350 Porras, A. R., Rosenbaum, K., Tor-Diez, C., Summar, M., & Linguraru, M. G.
O'Brien, T. D., Campbell, N. E., Potter, A. B., Letaw, J. H., Kulkarni, A., & (2021). Development and evaluation of a machine learning-based
Richards, C. S. (2022). Artificial intelligence (AI)-assisted exome reana- point-of-care screening tool for genetic syndromes in children: A mul-
lysis greatly aids in the identification of new positive cases and tinational retrospective study. The Lancet Digital Health, 3(10), e635–
reduces analysis time in a clinical diagnostic laboratory. Genetics in e643. https://doi.org/10.1016/S2589-7500(21)00137-0
Medicine: Official Journal of the American College of Medical Genetics, Rajpurkar, P., Chen, E., Banerjee, O., & Topol, E. J. (2022). AI in health and
24(1), 192–200. https://doi.org/10.1016/j.gim.2021.09.007 medicine. Nature Medicine., 28, 31–38. https://www.nature.com/
Ohno, K., Takeda, J.-I., & Masuda, A. (2018). Rules and tools to predict the articles/s41591-021-01614-0
splicing effects of exonic and intronic mutations. Wiley Interdisciplinary Richards, S., Aziz, N., Bale, S., Bick, D., Das, S., Gastier-Foster, J.,
Reviews. RNA, 9(1), 1–13. https://doi.org/10.1002/wrna.1451 Grody, W. W., Hegde, M., Lyon, E., Spector, E., Voelkerding, K.,
Olson, N. D., Wagner, J., Dwarshuis, N., Miga, K. H., Sedlazeck, F. J., Rehm, H. L., & ACMG Laboratory Quality Assurance Committee. (2015).
Salit, M., & Zook, J. M. (2023). Variant calling and benchmarking in an Standards and guidelines for the interpretation of sequence variants: A
era of complete human genome sequences. Nature Reviews Genetics, joint consensus recommendation of the American College of Medical
24, 464–483. https://doi.org/10.1038/s41576-023-00590-0 Genetics and Genomics and the Association for Molecular Pathology.
Osmond, M., Hartley, T., Johnstone, B., Andjic, S., Girdea, M., Gillespie, M., Genetics in Medicine: Official Journal of the American College of Medical
Buske, O., Dumitriu, S., Koltunova, V., Ramani, A., Boycott, K. M., & Genetics, 17(5), 405–424. https://doi.org/10.1038/gim.2015.30
Brudno, M. (2022). PhenomeCentral: 7 years of rare disease match- Robinson, J. R., Wei, W.-Q., Roden, D. M., & Denny, J. C. (2018). Defining
making. Human Mutation, 43(6), 674–681. https://doi.org/10.1002/ phenotypes from clinical data to drive genomic research. Annual
humu.24348 Review of Biomedical Data Science, 1, 69–92. https://doi.org/10.1146/
Owen, M. J., Lefebvre, S., Hansen, C., Kunard, C. M., Dimmock, D. P., annurev-biodatasci-080917-013335
Smith, L. D., Scharer, G., Mardach, R., Willis, M. J., Feigenbaum, A., Robinson, P. N., Köhler, S., Bauer, S., Seelow, D., Horn, D., & Mundlos, S.
Niemi, A.-K., Ding, Y., Van Der Kraan, L., Ellsworth, K., Guidugli, L., (2008). The human phenotype ontology: A tool for annotating and
Lajoie, B. R., McPhail, T. K., Mehtalia, S. S., Chau, K. K., … analyzing human hereditary disease. American Journal of Human Genet-
Kingsmore, S. F. (2022). An automated 13.5 hour system for scalable ics, 83(5), 610–615. https://doi.org/10.1016/j.ajhg.2008.09.017
diagnosis and acute management guidance for genetic diseases. Robinson, P. N., Köhler, S., Oellrich, A., Sanger Mouse Genetics Project,
Nature Communications, 13(1), 4057. https://doi.org/10.1038/ Wang, K., Mungall, C. J., Lewis, S. E., Washington, N., Bauer, S.,
s41467-022-31446-6 Seelow, D., Krawitz, P., Gilissen, C., Haendel, M., & Smedley, D. (2014).
Özden, F., Alkan, C., & Çiçek, A. E. (2022). Polishing copy number variant Improved exome prioritization of disease genes through cross-species
calls on exome sequencing data via deep learning. Genome Research, phenotype comparison. Genome Research, 24(2), 340–348. https://doi.
32(6), 1170–1182. https://doi.org/10.1101/gr.274845.120 org/10.1101/gr.160325.113
Pejaver, V., Byrne, A. B., Feng, B.-J., Pagel, K. A., Mooney, S. D., Romero, P. A., Tran, T. M., & Abate, A. R. (2015). Dissecting enzyme func-
Karchin, R., O'Donnell-Luria, A., Harrison, S. M., Tavtigian, S. V., tion with microfluidic-based deep mutational scanning. Proceedings of
Greenblatt, M. S., Biesecker, L. G., Radivojac, P., Brenner, S. E., & the National Academy of Sciences of the United States of America,
ClinGen Sequence Variant Interpretation Working Group. (2022). Cali- 112(23), 7159–7164. https://doi.org/10.1073/pnas.1422285112
bration of computational tools for missense variant pathogenicity clas- Russell, S., & Norvig, P. (2021). Artificial intelligence: A modern approach
sification and ClinGen recommendations for PP3/BP4 criteria. (4th ed.). Pearson.
American Journal of Human Genetics, 109(12), 2163–2177. https://doi. Safarova, M. S., Liu, H., & Kullo, I. J. (2016). Rapid identification of familial
org/10.1016/j.ajhg.2022.10.013 hypercholesterolemia from electronic health records: The SEARCH
Peterson, B., Hernandez, E. J., Hobbs, C., Malone Jenkins, S., Moore, B., study. Journal of Clinical Lipidology, 10(5), 1230–1239. https://doi.org/
Rosales, E., Zoucha, S., Sanford, E., Bainbridge, M. N., Frise, E., 10.1016/j.jacl.2016.08.001
Oriol, A., Brunelli, L., Kingsmore, S. F., & Yandell, M. (2023). Automated Schmidlen, T., Jones, C. L., Campbell-Salome, G., McCormick, C. Z.,
prioritization of sick newborns for whole genome sequencing using Vanenkevort, E., & Sturm, A. C. (2022). Use of a chatbot to increase
clinical natural language processing and machine learning. Genome uptake of cascade genetic testing. Journal of Genetic Counseling, 31(5),
Medicine, 15(1), 18. https://doi.org/10.1186/s13073-023-01166-7 1219–1230. https://doi.org/10.1002/jgc4.1592
Pilehvar, M. T., Bernard, A., Smedley, D., & Collier, N. (2022). PheneBank: Schmidlen, T., Schwartz, M., DiLoreto, K., Kirchner, H. L., & Sturm, A. C.
A literature-based database of phenotypes. Bioinformatics, 38(4), (2019). Patient assessment of chatbots for the scalable delivery of
1179–1180. https://doi.org/10.1093/bioinformatics/btab740 genetic counseling. Journal of Genetic Counseling, 28(6), 1166–1177.
Pina, A., Helgadottir, S., Mancina, R. M., Pavanello, C., Pirazzi, C., https://doi.org/10.1002/jgc4.1169
Montalcini, T., Henriques, R., Calabresi, L., Wiklund, O., Macedo, M. P., Shefchek, K. A., Harris, N. L., Gargano, M., Matentzoglu, N., Unni, D.,
Valenti, L., Volpe, G., & Romeo, S. (2020). Virtual genetic diagnosis for Brush, M., Keith, D., Conlin, T., Vasilevsky, N., Zhang, X. A., Balhoff, J. P.,
familial hypercholesterolemia powered by machine learning. European Babb, L., Bello, S. M., Blau, H., Bradford, Y., Carbon, S., Carmody, L.,
Journal of Preventive Cardiology, 27(15), 1639–1646. https://doi.org/ Chan, L. E., Cipriani, V., … Osumi-Sutherland, D. (2020). The monarch
10.1177/2047487319898951 initiative in 2019: An integrative data and analytic platform connecting
phenotypes to genotypes across species. Nucleic Acids Research, 48(D1), Truty, R., Paul, J., Kennemer, M., Lincoln, S. E., Olivares, E.,
D704–D715. https://doi.org/10.1093/nar/gkz997 Nussbaum, R. L., & Aradhya, S. (2019). Prevalence and properties of
Sheth, S., Lee, P., Bajaj, A., Cuchel, M., Hajj, J., Soffer, D. E., Webb, G., intragenic copy-number variation in Mendelian disease genes. Genetics
Hossain, E., Borovskiy, Y., Risman, M., Myers, K. D., Wilemon, K. A., in Medicine: Official Journal of the American College of Medical Genetics,
Rader, D. J., & Jacoby, D. (2021). Implementation of a machine- 21(1), 114–123. https://doi.org/10.1038/s41436-018-0033-5
learning algorithm in the electronic health record for targeted screen- Tseng, J.-J., Lu, C.-H., Li, J.-Z., Lai, H.-Y., Chen, M.-H., Cheng, F.-Y., &
ing for familial hypercholesterolemia: A quality improvement study. Kuo, C.-E. (2023). An open dataset of annotated metaphase cell
Circulation: Cardiovascular Quality and Outcomes, 14(6), e007641. images for chromosome identification. Scientific Data, 10(1), 104.
https://doi.org/10.1161/CIRCOUTCOMES.120.007641 https://doi.org/10.1038/s41597-023-02003-7
Siglen, E., Vetti, H. H., Lunde, A. B. F., Hatlebrekke, T. A., Strømsvik, N., Vajen, B., Hänselmann, S., Lutterloh, F., Käfer, S., Espenkötter, J.,
Hamang, A., Hovland, S. T., Rettberg, J. W., Steen, V. M., & Beening, A., Bogin, J., Schlegelberger, B., & Göhring, G. (2022). Classifi-
Bjorvatn, C. (2022). Ask Rosa – The making of a digital genetic conver- cation of fluorescent R-band metaphase chromosomes using a convo-
sation tool, a chatbot, about hereditary breast and ovarian cancer. lutional neural network is precise and fast in generating karyograms of
Patient Education and Counseling, 105(6), 1488–1494. https://doi.org/ hematologic neoplastic cells. Cancer Genetics, 260–261, 23–29.
10.1016/j.pec.2021.09.027 https://doi.org/10.1016/j.cancergen.2021.11.005
Silvestri, M., Nghia Vu, T., Nichetti, F., Niger, M., Di Cosimo, S., De Välipakka, S., Savarese, M., Sagath, L., Arumilli, M., Giugliano, T., Udd, B., &
Braud, F., Pruneri, G., Pawitan, Y., Calza, S., & Cappelletti, V. (2023). Hackman, P. (2020). Improving copy number variant detection from
Comprehensive transcriptomic analysis to identify biological and clini- sequencing data with a combination of programs and a predictive
cal differences in cholangiocarcinoma. Cancer Medicine, 12(8), 10156– model. The Journal of Molecular Diagnostics, 22(1), 40–49. https://doi.
10168. https://doi.org/10.1002/cam4.5719 org/10.1016/j.jmoldx.2019.08.009
Simmons, M., Singhal, A., & Lu, Z. (2016). Text mining for precision medi- Variant effect predictors. (n.d.). Atlas of variant effects alliance. Retrieved
cine: Bringing structure to EHRs and biomedical literature to under- May 30, 2023, from: https://www.varianteffect.org/veps
stand genes and health. Advances in Experimental Medicine and Visibelli, A., Roncaglia, B., Spiga, O., & Santucci, A. (2023). The impact of
Biology, 939, 139–166. https://doi.org/10.1007/978-981-10-1503- artificial intelligence in the odyssey of rare diseases. Biomedicine, 11(3),
8_7 887. https://doi.org/10.3390/biomedicines11030887
Smith, E. D., Savage, S. K., Andrew, E. H., Martin, G. M., Kahn- Walter, W., Haferlach, C., Nadarajah, N., Schmidts, I., Kühn, C., Kern, W., &
Kirby, A. H., LoTempio, J., Délot, E., Cohen, A. J., Pitsava, G., Haferlach, T. (2021). How artificial intelligence might disrupt diagnos-
Berger, S., Fusaro, V. A., & Vilain, E. (2023). Development and imple- tics in hematology in the near future. Oncogene, 40(25), 4271–4280.
mentation of novel chatbot-based genomic research consent. bioR- https://doi.org/10.1038/s41388-021-01861-y
xiv.Org: The Preprint Server for Biology. https://doi.org/10.1101/ Wang, T., Antonacci-Fulton, L., Howe, K., Lawson, H. A., Lucas, J. K.,
2023.01.23.525221 Phillippy, A. M., Popejoy, A. B., Asri, M., Carson, C., Chaisson, M. J. P.,
Snir, M., Nazareth, S., Simmons, E., Hayward, L., Ashcraft, K., Bristow, S. L., Chang, X., Cook-Deegan, R., Felsenfeld, A. L., Fulton, R. S.,
Esplin, E. D., & Aradhya, S. (2021). Democratizing genomics: Leverag- Garrison, E. P., Garrison, N. A., Graves-Lindsay, T. A., Ji, H., Kenny, E. E.,
ing software to make genetics an integral part of routine care. Ameri- … Human Pangenome Reference Consortium. (2022). The human pan-
can Journal of Medical Genetics. Part C, Seminars in Medical Genetics, genome project: A global resource to map genomic diversity. Nature,
187(1), 14–27. https://doi.org/10.1002/ajmg.c.31866 604(7906), 437–446. https://doi.org/10.1038/s41586-022-04601-8
Solomon, B. D. (2022). Can artificial intelligence save medical genetics? Wright, C. F., Campbell, P., Eberhardt, R. Y., Aitken, S., Perrett, D.,
American Journal of Medical Genetics. Part A, 188(2), 397–399. https:// Brent, S., Danecek, P., Gardner, E. J., Chundru, V. K., Lindsay, S. J.,
doi.org/10.1002/ajmg.a.62538 Andrews, K., Hampstead, J., Kaplanis, J., Samocha, K. E., Middleton, A.,
Son, J. H., Xie, G., Yuan, C., Ena, L., Li, Z., Goldstein, A., Huang, L., Wang, L., Foreman, J., Hobson, R. J., Parker, M. J., Martin, H. C., … DDD Study.
Shen, F., Liu, H., Mehl, K., Groopman, E. E., Marasa, M., Kiryluk, K., (2023). Genomic diagnosis of rare pediatric disease in the
Gharavi, A. G., Chung, W. K., Hripcsak, G., Friedman, C., Weng, C., & United Kingdom and Ireland. The New England Journal of Medicine,
Wang, K. (2018). Deep phenotyping on electronic health records facili- 388(17), 1559–1571. https://doi.org/10.1056/NEJMoa2209046
tates genetic diagnosis by clinical exomes. American Journal of Human Yang, Z., Shikany, A., Ni, Y., Zhang, G., Weaver, K. N., & Chen, J. (2022).
Genetics, 103(1), 58–73. https://doi.org/10.1016/j.ajhg.2018.05.010 Using deep learning and electronic health records to detect Noonan
Steinfeldt, J., Buergel, T., Loock, L., Kittner, P., Ruyoga, G., Zu Belzen, J. U., syndrome in pediatric patients. Genetics in Medicine: Official Journal of
Sasse, S., Strangalies, H., Christmann, L., Hollmann, N., Wolf, B., the American College of Medical Genetics, 24(11), 2329–2337. https://
Ference, B., Deanfield, J., Landmesser, U., & Eils, R. (2022). Neural doi.org/10.1016/j.gim.2022.08.002
network-based integration of polygenic and clinical information: Develop- Yu, K. H., Beam, A. L., & Kohane, I. S. (2018). Artificial intelligence in
ment and validation of a prediction model for 10-year risk of major healthcare. Nature Biomedical Engineering, 2(10), 719–731. https://doi.
adverse cardiac events in the UK Biobank cohort. The Lancet Digital org/10.1038/s41551-018-0305-z
Health, 4(2), e84–e94. https://doi.org/10.1016/S2589-7500(21)00249-1 Zeng, T., & Li, Y. I. (2022). Predicting RNA splicing from DNA sequence
Stenzinger, A., Alber, M., Allgäuer, M., Jurmeister, P., Bockmayr, M., using pangolin. Genome Biology, 23(1), 103. https://doi.org/10.1186/
Budczies, J., Lennerz, J., Eschrich, J., Kazdal, D., Schirmacher, P., s13059-022-02664-4
Wagner, A. H., Tacke, F., Capper, D., Müller, K.-R., & Klauschen, F.
(2022). Artificial intelligence and pathology: From principles to practice
and future applications in histomorphology and molecular profiling.
Seminars in Cancer Biology, 84, 129–143. https://doi.org/10.1016/j. How to cite this article: Aradhya, S., Facio, F. M., Metz, H.,
semcancer.2021.02.011
Manders, T., Colavin, A., Kobayashi, Y., Nykamp, K., Johnson,
Sun, C. X., Daniel, P., Bradshaw, G., Shi, H., Loi, M., Chew, N., Parackal, S.,
Tsui, V., Liang, Y., Koptyra, M., Adjumain, S., Sun, C., Chong, W. C., B., & Nussbaum, R. L. (2023). Applications of artificial
Fernando, D., Drinkwater, C., Tourchi, M., Habarakada, D., Sooraj, D., intelligence in clinical laboratory genomics. American Journal of
Carvalho, D., … Firestein, R. (2023). Generation and multi-dimensional Medical Genetics Part C: Seminars in Medical Genetics, 193C:
profiling of a childhood cancer cell line atlas defines new therapeutic
e32057. https://doi.org/10.1002/ajmg.c.32057
opportunities. Cancer Cell, 41(4), 660–677.e7. https://doi.org/10.
1016/j.ccell.2023.03.007

2023 - Aradhya -

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2023 - Aradhya -

Uploaded by

Copyright:

Available Formats

Received: 15 June 2023 Revised: 13 July 2023 Accepted: 19 July 2023

Applications of artificial intelligence in clinical laboratory

Swaroop Aradhya 1,2 | Flavia M. Facio 1 | Hillery Metz 1 | Toby Manders 1 |

1 | N E C E S S I T Y OF A R T I F I C I A L generation sequencing (NGS). Exome and genome sequencing—even

Am J Med Genet. 2023;193C:e32057. wileyonlinelibrary.com/journal/ajmgc 1 of 15

genomics. These genomic data are essential in precision medicine,

B O X 1 ChatGPT on use of AI in genetic medicine. Interacted 3 | SEQUENCING AND BIOINFORMATICS

Gia and other AI chatbots

ACACCG Illumina DRAGEN pipeline

Invitae Apollo, Fabric GEM, and Illumina Emedgene

Human-aided automated reporting to clinicians

Gia and other AI chatbots

Benign LB VUS LP Pathogenic

(a) within a single experiment. As a result, MAVEs present a useful oppor-

NLP has become ubiquitous in modern society, including in genomic med-

You might also like