Professional Documents
Culture Documents
Application of Machine Learning Tools and Integrated OMICS For Screening and Diagnosis of Inborn Errors of Metabolism
Application of Machine Learning Tools and Integrated OMICS For Screening and Diagnosis of Inborn Errors of Metabolism
https://doi.org/10.1007/s11306-023-02013-x
ORIGINAL ARTICLE
Received: 31 October 2022 / Accepted: 20 April 2023 / Published online: 3 May 2023
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023
Abstract
Introduction Tandem mass spectrometry (TMS) has emerged an important screening tool for various metabolic disorders in
newborns. However, there is inherent risk of false positive outcomes. Objective To establish analyte-specific cutoffs in TMS
by integrating metabolomics and genomics data to avoid false positivity and false negativity and improve its clinical utility.
Methods TMS was performed on 572 healthy and 3000 referred newborns. Urine organic acid analysis identified 23 types
of inborn errors in 99 referred newborns. Whole exome sequencing was performed in 30 positive cases. The impact of
physiological changes such as age, gender, and birthweight on various analytes was explored in healthy newborns. Machine
learning tools were used to integrate demographic data with metabolomics and genomics data to establish disease-specific
cut-offs; identify primary and secondary markers; build classification and regression trees (CART) for better differential
diagnosis; for pathway modeling.
Results This integration helped in differentiating B12 deficiency from methylmalonic acidemia (MMA) and propionic aci-
demia (Phi coefficient=0.93); differentiating transient tyrosinemia from tyrosinemia type 1 (Phi coefficient=1.00); getting
clues about the possible molecular defect in MMA to initiate appropriate intervention (Phi coefficient=1.00); to link pathoge-
nicity scores with metabolomics profile in tyrosinemia (r2=0.92). CART model helped in establishing differential diagnosis
of urea cycle disorders (Phi coefficient=1.00).
Conclusion Calibrated cut-offs of different analytes in TMS and machine learning-based establishment of disease-specific
thresholds of these markers through integrated OMICS have helped in improved differential diagnosis with significant reduc-
tion of the false positivity and false negativity rates.
Keywords Inborn errors of metabolism · Newborn screening · Tandem mass spectrometry · Machine learning ·
Integrated OMICS · Cut-off values
13
49 Page 2 of 10 G. Usha Rani et al.
have helped in identifying the high prevalence of treatable The number of participants in each subgroup based on age
metabolic disorders in India. The initial study was con- and gender were tabulated in Supplementary Table 1. The
ducted in the 1980s’ based on thin layer chromatography mean birth weight was 2.72 ± 0.71 Kg. Subsequently, 3000
(TLC) where 98,256 newborns were tested and 46 amino newborn samples referred from different hospitals were
acid disorders were identified with an incidence of 1 in screened for IEMs using TMS. Pre-analytical requirements
2136 for amino acidopathies (Rao et al., 1988). The first i.e., sample collection, storage, and transportation, and ana-
systematic NBS was conducted in Andhra Pradesh, where lytical requirements such as quality control, and reagent
20,000 newborns born in all major government hospitals stability were taken care of. Samples that did not pass pre-
of Hyderabad were tested for treatable IEMs. TLC and analytical and analytical criteria were excluded from the
High-Performance Liquid Chromatography (HPLC) were analysis. Whatman 903 grade filter paper was used for col-
used for amino acid analysis. This study showed an inci- lecting blood samples. The DBS samples were collected
dence of 1:3660 for amino acidopathies (Rama Devi et al. from the heel prick of babies and were collected after 48 h
2004). TMS-based NBS has been popular in the last decade. of birth following two feeds from the mother.
An initial study on 2550 clinically symptomatic children Urine samples were collected from patients with
showed a 3.2% positivity rate with 54% positive cases informed consent from their parents. Frozen urine samples
showing amino acidopathies, 41.6% cases with organic aci- or urine-soaked filter papers were recommended for sam-
demias, and 4.4% showing FAOD (Nagaraja et al., 2010). A ples received from the outstation.
population-based study representing 4946 newborns from
the rural population of Andhra Pradesh identified 5 positive 2.2 LC-MS/MS analysis
cases, giving an incidence of 1:1000. The identified disor-
ders are Carnitine uptake disease, Isovaleric acidemia, Glu- The derivatized amino acid and acylcarnitine reagent kit
taric aciduria type-I, and Glutaric aciduria type-II (Sahai et (ClinSpot® LC-MS/MS Complete Kit) was purchased from
al., 2011). Although TMS is being performed in many cor- Recipe (Part no. MS10000). This analytical method directly
porate hospitals currently, there are no comprehensive stud- determines 49 analytes of amino acids and acylcarnitines
ies with inclusion of confirmatory and genetic tests for a (free carnitine, 35 acylcarnitines, and 13 amino acids), with-
better genotype-phenotype correlation. In the current study, out chromatographic separation on an HPLC column, via
we analyzed all screen-positive cases further using Gas tandem mass spectrometry (MS/MS) at a constant flow rate
Chromatography-Mass Spectrometry (GC-MS) for organic of 0.1 ml/min. The runtime was 1 min with the isocratic
acid analysis. Whole exome sequencing (WES) was per- program. The detection was performed on an LC-MS/MS
formed to identify the causative genetic defects based on the system (LCMS-8045, Shimadzu Co., Ltd., Japan), which
consent of the parents. Machine learning (ML) tools were was operated in positive ion mode (capillary voltage 2.0 kV,
applied to correlate metabolomics data with genomics data. desolvation temperature 526 °C). The internal standard is
Such correlation was explored for establishing differential used to calculate the analyte concentration of the sample.
diagnosis criteria for certain metabolic disorders. It would Quantification of analytes was achieved using peak areas
therefore be possible to initiate therapy without waiting for that were processed using Neonatal software.
the genetic report so that any damage to the organs can be
avoided. 2.3 GC/MS analysis
In the current study, we have calibrated the cut-offs of
different analytes in TMS and used ML tools to provide Urine organic acid analysis was performed as per the pub-
meaningful insights toward differential diagnosis through lished protocols (Tanaka et al., 1980; Kimura et al., 1999).
integrated OMICS. It minimizes false positives and nega- Hydroxylamine hydrochloride, margaric acid, hydrochloric
tives and guides toward a specific diagnosis. acid, ethyl acetate, sodium sulphate, and sodium hydrox-
ide were purchased from Merck. Hydrocarbon mixture
(C10-26) was from GL Sciences and urease, N, O-bis
2 Materials and methods (trimethylsilyl)-fluoroacetamide (BSTFA) with 1% trimeth-
ylchlorosilane (TMCS), and tropic acid were procured from
2.1 Study subjects Sigma Aldrich.
GC/MS system (Nexis GC-2030, GCMS-QP 2020 NX;
The baseline data were established on the basis of 572 Shimadzu Co., Ltd., Japan) with the capillary column was a
newborns (0–7 months) with 234 females and 338 males fused silica Rtx-5MS (30 m X 0.25 mm) with 0.25 μm film
who were healthy and had no clinical symptoms sugges- thickness of diphenyl dimethyl polysiloxane. Standard elec-
tive of metabolic disease during 3–4 months of follow-up. tron impact ionization scanning resulted in the Mass spectra
13
Application of machine learning tools and integrated OMICS for screening and diagnosis of inborn errors of… Page 3 of 10 49
in the range of m/z 35–600 at the rate of 0.4 s/cycle. The model. We used student t-test for analyzing the impact of
temperature program was started at 100 °C with initial hold- gender-differences in each analyte. One-way ANOVA was
ing for 4 min and was increased at the rate of 4 °C/minute to used to assess the changes in metabolites with growing age
290 °C, with final holding for 10 min. The temperatures of (from birth to 6–7 months of life). Computational platforms
the injection port and interphase were both at 280 °C. The like https://statpages.info/ and https://www.socscistatistics.
flow rate of the helium carrier was 1.5 ml/min, and the linear com were employed for the analysis.
velocity was 40.2 m/sec. A final derivatized aliquot of 1 µl
was injected into GC/MS in spitless mode. Data acquisi-
tion was performed in the scan mode using a mass-to-charge 3 Results
ratio. The run start time at 2.08 min and the end time was
60.86 min with the ion source temperature at 200 °C. 3.1 Establishing population-specific reference
ranges for tandem mass spectrometry
2.4 Whole exome sequencing
The data from 572 healthy newborns were used to estab-
WES was performed on the next-generation sequencing lish the population-specific reference ranges (1st percen-
platform using the MGI machine (DNBSEQ-G50). The tile − 99th percentile) of acylcarnitines and amino acids.
Twist Bioscience Kit was used for preparing libraries for The upper limits of each analyte in controls were consid-
WES. The concentration of each library was determined ered as the cut-off. Our cut-offs showed good correlation
using the QUBIT 2.0 Fluorometer. Samples were pooled (r2 = 0.906) with the cut-offs of the Center for Disease Con-
before sequencing with each sample at a final concentration trol and Prevention (CDC).
of 1.8 pM. Sequencing was performed on the DNBSEQ- Arginine, citrulline, and valine showed a positive corre-
G50 platform using 150 cycles, paired-end chemistry by lation with age. C2, C16, and C16:1 acylcarnitines showed
targeting 100x coverage. an inverse association with age while C8 and C4DC have a
positive association. No statistically significant gender dif-
2.5 Machine learning algorithm: classification and ferences were observed in the amino acid and acylcarnitine
regression (CART) model profiles. Except for C16 and C6DC, none of the analytes
showed statistically significant association with birth weight
Demographic and metabolic data were used as the input after Bonferroni’s corrections (Table 1).
variables and the diagnosis was used as the output to iden- Gender differences in the distribution of glutamic acid,
tify disease-specific thresholds of acylcarnitines and amino glycine, C5, C6DC, C16OH, C14:1, and C16:1OH were
acids. The CART model was built as proposed by Leo Brei- observed (Supplemental Table 2). Additionally, the refer-
man. This is a binary tree algorithm with the most important ence ranges significantly changed from 1 month to 6 months
determinant at the apex of the tree with subsequent branches of age in infancy, specifically for the analyte’s aspartic acid,
formed with other variables in descending order of impor- citrulline, glutamic acid, glycine, phenylalanine, C16, C18,
tance. Each root node was representative of a single input C4OH, C4DC, C16:1, and C18:2 in females (Supplemen-
variable. Branching was based on a specific threshold of tal Table 3) similarly arginine, aspartic acid, glutamic acid,
that variable that predict the outcome variable with reason- glycine, leucine, ornithine, phenylalanine, C5DC, C16,
ably. The extent of branching was optimized by pruning the C18, C4DC, C16OH, C14:1, C16:1, C18:2, and C18:1 were
tree to achieve the required prediction with minimal branch- affected by age in males (Supplemental Table 4). These dif-
ing. If multiple markers suggestive of a group of metabolic ferences can be attributed to altered metabolic rates with
disorders, differential diagnostic strategies were deduced age.
on the basis of these decision trees. The CART model was
validated using cases where molecular confirmation was 3.2 Incidence of IEMs among referred cases
available.
The screen-positive cases on TMS were further confirmed
2.6 Statistical analysis by urine organic acid analysis on GC/MS. The metabolite
elevations in the urine were consistent with the findings
The validity of each decision level of the CART model was observed on TMS. Among amino acid disorders, Tyrosin-
assessed on the basis of number of True Positives, True emia and Maple syrup urine disease are the most common,
Negatives, False Positives and False Negatives computed followed by hypermethioninemia, urea cycle disorder, and
in a 2 × 2 contingency table. Fisher exact test was performed phenylketonuria (PKU). Among organic acidurias, Propi-
for the calculation of the performance characteristics of the onic acidemia (PA) and Glutaric acidemia type-I (GA-1)
13
49 Page 4 of 10 G. Usha Rani et al.
are the most common accompanied by Methylmalonic aci- deficiency, Long-chain 3-hydroxyacyl-CoA dehydrogenase
demia (MMA), beta-keto thiolase deficiency, Isovaleric (LCHAD) deficiency, and Short-chain 3-hydroxyacyl-CoA
acidemia (IVA), Carnitine update disease, and 3-hydroxy- dehydrogenase (SCHAD) deficiency. Among Urea cycle
3-methylglutaryl-CoA lyase deficiency (HMG-CoA lyase disorders, Argininosuccinate synthetase 1 (ASSI) deficiency
deficiency). Among FAODs, MCAD is the most com- is the most common preceded by Argininemia (Table 2).
mon followed by Carnitine palmitoyltransferase I (CPT I)
13
Application of machine learning tools and integrated OMICS for screening and diagnosis of inborn errors of… Page 5 of 10 49
Table 2 Incidence rate of inborn errors of metabolism (IEMs) showed 87.5% accuracy in diagnosing B12 deficiency and
(n = 3000)
100% accuracy in segregating the data into normal, MMA,
Disorder No. of Inci-
confirmed dence or PA with overall precision of 98.68% (Fig. 1). In GA-1
cases and IVA patients, the cut-off value of C5DC and C5 were
Amino acid disorders > 0.6 µmol/L and > 2.18 µmol/L, respectively.
Phenylketonuria 2 1:1500 The most frequent amino acid abnormality in Indian new-
Maple syrup urine disease (MSUD) 5 1:600 borns is tyrosine elevation. We have established thresholds
Argininemia 4 1:750 of tyrosine and methionine that can differentiate transient
Citrullinemia 2 1:1500 Tyrosinemia from classical Tyrosinemia type I (Phi coef-
Hypermethioninemia 4 1:750
ficient: 1.00). Tyrosine > 235 µmol/L and methionine > 34
Tyrosinemia 7 1:430
µmol/L indicate Tyrosinemia type I (Fig. 2). Tyrosine lev-
Organic acid disorders
els between 167 and 234.72 µmol/L and methionine levels
Carnitine uptake defect (CUD) 2 1:1500
Propionic aciduria (PA) 8 1:360
between 17 and 61.2 µmol/L indicate transient tyrosinemia.
Methylmalonic aciduria (MMA) 6 1:500
Isovaleric acidemia (IVA) 2 1:1500 3.4 Differential diagnosis of Urea Cycle Disorders
Beta keto thiolase (BKT) 5 1:600 (UCDs)
Glutaric aciduria type-1 (GA-1) 8 1:360
3-hydroxy-3-methylglutaryl-CoA lyase (HMG- 2 1:1500 During our study, we identified 28 UCDs with molecular
CoA lyase) deficiency confirmation. Two cases with low levels of citrulline and
Fatty acid oxidation disorders arginine with orotic aciduria were diagnosed to have Orni-
Medium-chain acyl-CoA dehydrogenase 9 1:330
thine transcarbamylase deficiency (OTC c.533 C > T, OTC
(MCAD) deficiency
Carnitine palmitoyltransferase I (CPT I) 3 1:1000
c.275G > A). Two cases with normal urine organic acid
deficiency profile had NAGS (c.702-4delA) and CPS1 (c.446G > A)
Long-chain 3-hydroxyacyl-CoA dehydroge- 1 1:3000 deficiencies. The most frequent UCD is citrullinemia.
nase (LCHAD) deficiency Citrulline > 417.5 µmol/L indicated ASS1 deficiency
Short-chain acyl-CoA dehydrogenase (SCAD) 1 1:3000 (c.1088G > A, c.1168G > A). Citrulline < 417.5 µmol/L and
deficiency
glutamine can differentiate between ASS1 (n = 11) and ASL
Urea Cycle Disorders
(n = 4) where ASL showed glutamine > 376.95 µmol/L. In
N-acetylglutamate synthetase (NAGS) 1 1:3000
deficiency Argininemia (n = 9), the disease-specific threshold for Argi-
Carbamoyl phosphate synthetase I (CPS1) 1 1:3000 nine was > 195.7 µmol/L (ARG1 c.899 C > G).
deficiency
Ornithine transcarbamylase (OTC) deficiency 2 1:1500 3.5 Integration of OMICS in the differential
Argininosuccinate synthetase 1 (ASSI) 11 1:272 diagnosis of IEMs by ML
deficiency
Argininosuccinate lyase deficiency 4 1:750
Integration of demographic data with metabolomics and
Argininemia 9 1:330
genomics using ML has given subtype-specific thresh-
Total 99 1:30
olds for further differentiation of MMA cases. If the age
of onset is > 27 months, the most likely MMA subtype is
3.3 Utility of ML in the differential diagnosis of IEMs Cb1C deficiency. If the age of onset is < 9 months and the
C3/C2 ratio < 10.35, most likely to be MUT deficiency. C3/
C3 deficiency is the most common abnormality observed in C2 > 10.35 results in MUT or Cb1B deficiency. If the age
Indian children. Through ML, we have identified different of onset is between 9 and 27 months, MMA elevation < 221
disease-specific thresholds for C3 and C3/C2 ratio that helps folds suggests Cb1A deficiency. MMA elevation is > 221
in distinguishing normal children from those with vitamin folds and C3/C2 ratio in between 11.89 and 12.22 suggests
B12 deficiency and MMA or PA (Phi coefficient = 0.93). of Cb1A deficiency. In MUT deficiency, C3/C2 ratio > 12.33
C3 < 5.22 µmol/L and C3/C2 ratio in the range of 0.38–0.68 with significant elevation of MMA. This prediction model
suggests vitamin B12 deficiency. Patients with MMA have showed 100% specificity and sensitivity in the differential
a C3 > 5.22 ratio and a C3/C2 ratio between 0.68 and 3.2. diagnosis of MMA (Fig. 3). The sensitivity and specificity
In PA patients, the level of C3 is > 7.38 µmol/L and the C3/ of the standard procedure used in our laboratory were 95.8%
C2 ratio is 0.72–4.29. TMS analysis alone cannot differenti- and 98.7%, respectively. ML-based approach improved the
ate MMA and PA. However, urine organic acid analysis by sensitivity to 98.2% and 99.8%, respectively.
GC/MS will help in the differential diagnosis. These ML
13
49 Page 6 of 10 G. Usha Rani et al.
Fig. 1 Utility of C3 and C3/C2 ratio in the differential diagnosis of B12 deficiency, methylmalonic acidemia, and propionic acidemia
13
Application of machine learning tools and integrated OMICS for screening and diagnosis of inborn errors of… Page 7 of 10 49
Fig. 2 Utility of tyrosine and methionine levels in distinguishing tran- nine Tyrosinemia type I cases showed the following FAH mutations
sient tyrosinemia from tyrosinemia type 1 namely c.192G > T, c.941T > C, c.983 A > G, c.998delA, c.1159G > A,
Tyrosine (Tyr) and Methionine (Meth) levels were able to distinguish c.1211G > A and c.709 C > T. Three cases exhibited c.192G > T
transient tyrosinemia from tyrosinemia type 1. WES analysis revealed mutation.
the amino acid and acylcarnitine profile. The positivity rate al., 2011). Phenylalanine hydroxylase deficiency, MCAD
in referred cases in our study was 2.37%, while the positiv- deficiency, and methylcrotonyl CoA carboxylase (3-MCC)
ity rate was 1.4% in another large-scale study (Babu et al., deficiency were detected with > 95.2% and false-positive
2015). This study is in agreement with our study in iden- rate < 0.001% with machine learning (Baumgartner et al.,
tifying methylmalonic aciduria, glutaric acidemia type 1, 2005).
propionic aciduria, maple syrup urine disease, phenylketon- We have used ML tools to reduce the false positivity
uria, and tyrosinemia as the most frequent IEMs (Hampe et rate in NBS by defining disease-specific thresholds, and by
al., 2017). Only two analytes showed association with birth identifying primary and secondary markers for differential
weight i.e., C16 and C6DC. However, none of these mark- diagnosis. A recent study also used a similar approach with
ers are individually specific for any metabolic disorder. C16 a Random Forest machine learning classifier that minimized
elevation along with C18, C18:1 and C18:2 is an indicator the number of false positives for GA-1 by 89%, for MMA
of CPT II deficiency or carnitine/acylcartinine translocase by 45%, for OTCD by 98%, and for VLCAD deficiency by
deficiency. In view of both primary and secondary markers, 2% (Peng et al., 2020). ML models based on diagnostic cut-
the diagnosis will not be affected by birth weight. offs reduced the number of false positive cases from 21 to
Classification and regression trees based on metabolite 2 for phenylketonuria, from 30 to 10 for hypermethionin-
levels and their mutual ratios have been used in newborn emia, and from 209 to 46 for 3-MCC deficiency (Chen et
screening for a long time. C8, C10, and C8/C2 were used for al., 2013).
detecting medium chain acyl CoA dehydrogenase (MCAD) The utility of integrating metabolomics data with genom-
deficiency in a study from Belgium (Van den Bulcke et ics has been investigated earlier by coupling untargeted
13
49 Page 8 of 10 G. Usha Rani et al.
metabolomics upstream or downstream to the primary reac- a ML model capable of differential diagnosis. This model
tion with in silico-simulated WES results to increase the could give possible clues about the subtypes of methylmalo-
diagnostic value (Kerkhofs et al., 2020). We have developed nic acidemia based on the age of onset, C3/C2 ratio, and uri-
a pathogenicity score prediction model using metabolomics nary methylmalonic acid content. This information will be
data in patients with tyrosinemia. In the current study, we of clinical utility in initiating the therapy as early as possible
have integrated TMS, GC-MS, and genomics data to derive as waiting for the molecular report may adversely affect the
13
Application of machine learning tools and integrated OMICS for screening and diagnosis of inborn errors of… Page 9 of 10 49
13
49 Page 10 of 10 G. Usha Rani et al.
using machine learning. International journal of neonatal screen- Van den Bulcke, T., Vanden Broucke, P., Van Hoof, V., Wouters, K.,
ing, 6(1), 16. Vanden Broucke, S., Smits, G., Smits, E., Proesmans, S., Van
Rama Devi, A. R., & Naushad, S. M. (2004). Newborn screening in Genechten, T., & Eyskens, F. (2011). Data mining methods
India. Indian journal of pediatrics, 71(2), 157–160. for classification of Medium-Chain Acyl-CoA dehydrogenase
Rao, N. A., Devi, A. R., Savithri, H. S., Rao, S. V., & Bittles, A. H. deficiency (MCADD) using non-derivatized tandem MS neo-
(1988). Neonatal screening for amino acidaemias in Karnataka, natal screening data. Journal of biomedical informatics, 44(2),
south India. Clinical genetics, 34(1), 60–63. 319–325. Wilcken, B., Wiley, V., Hammond, J., & Carpenter, K.
Sahai, I., Zytkowicz, T., Rao Kotthuri, S., Lakshmi Kotthuri, A., (2003). Screening newborns for inborn errors of metabolism by
Eaton, R. B., & Akella, R. R. (2011). Neonatal screening for tandem mass spectrometry. The New England journal of medi-
inborn errors of metabolism using tandem mass spectrometry: cine, 348(23), 2304–2312.
Experience of the pilot study in Andhra Pradesh, India. Indian
journal of pediatrics, 78(8), 953–960. Publisher’s Note Springer Nature remains neutral with regard to juris-
Tanaka, K., West-Dull, A., Hine, D. G., Lynn, T. B., & Lowe, T. dictional claims in published maps and institutional affiliations.
(1980). Gas-chromatographic method of analysis for urinary
organic acids. II. Description of the procedure, and its application Springer Nature or its licensor (e.g. a society or other partner) holds
to diagnosis of patients with organic acidurias. Clinical chemistry, exclusive rights to this article under a publishing agreement with the
26(13), 1847–1853. author(s) or other rightsholder(s); author self-archiving of the accepted
Teodoro-Morrison, T., Kyriakopoulou, L., Chen, Y. K., Raizman, J. E., manuscript version of this article is solely governed by the terms of
Bevilacqua, V., Chan, M. K., Wan, B., Yazdanpanah, M., Schulze, such publishing agreement and applicable law.
A., & Adeli, K. (2015). Dynamic biological changes in metabolic
disease biomarkers in childhood and adolescence: A CALIPER
study of healthy community children. Clinical biochemistry,
48(13–14), 828–836.
13