Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Metabolomics (2023) 19:49

https://doi.org/10.1007/s11306-023-02013-x

ORIGINAL ARTICLE

Application of machine learning tools and integrated OMICS for


screening and diagnosis of inborn errors of metabolism
Ganni Usha Rani1 · Srilatha Kadali1 · Banka Kurma Reddy1 · Dudekula Shaheena1 · Shaik Mohammad Naushad1

Received: 31 October 2022 / Accepted: 20 April 2023 / Published online: 3 May 2023
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023

Abstract
Introduction Tandem mass spectrometry (TMS) has emerged an important screening tool for various metabolic disorders in
newborns. However, there is inherent risk of false positive outcomes. Objective To establish analyte-specific cutoffs in TMS
by integrating metabolomics and genomics data to avoid false positivity and false negativity and improve its clinical utility.
Methods TMS was performed on 572 healthy and 3000 referred newborns. Urine organic acid analysis identified 23 types
of inborn errors in 99 referred newborns. Whole exome sequencing was performed in 30 positive cases. The impact of
physiological changes such as age, gender, and birthweight on various analytes was explored in healthy newborns. Machine
learning tools were used to integrate demographic data with metabolomics and genomics data to establish disease-specific
cut-offs; identify primary and secondary markers; build classification and regression trees (CART) for better differential
diagnosis; for pathway modeling.
Results This integration helped in differentiating B12 deficiency from methylmalonic acidemia (MMA) and propionic aci-
demia (Phi coefficient=0.93); differentiating transient tyrosinemia from tyrosinemia type 1 (Phi coefficient=1.00); getting
clues about the possible molecular defect in MMA to initiate appropriate intervention (Phi coefficient=1.00); to link pathoge-
nicity scores with metabolomics profile in tyrosinemia (r2=0.92). CART model helped in establishing differential diagnosis
of urea cycle disorders (Phi coefficient=1.00).
Conclusion Calibrated cut-offs of different analytes in TMS and machine learning-based establishment of disease-specific
thresholds of these markers through integrated OMICS have helped in improved differential diagnosis with significant reduc-
tion of the false positivity and false negativity rates.

Keywords Inborn errors of metabolism · Newborn screening · Tandem mass spectrometry · Machine learning ·
Integrated OMICS · Cut-off values

1 Introduction The spectrum of disorders that can be screened by NBS has


increased tremendously after the introduction of tandem
Newborn screening (NBS) is aimed at detecting most of mass spectrometry (TMS) [Harms et al. 2011], thus cover-
the Inborn errors of metabolism (IEM), including inher- ing aminoacidopathies, fatty acid oxidation defects (FAOD),
ited congenital endocrinopathies and metabolic disorders at carnitine cycle defects, and organic acidemias [Wilcken et
a very early stage. Enzyme or cofactor deficiencies cause al. 2003]. The efficiency of NBS depends on the precision
these diseases due to the accumulation of toxic metabolites, in establishing analyte-specific thresholds that distinguish
which may cause irreversible organ damage in the first few physiological changes from pathological changes. This pre-
days of life. Early diagnosis and timely intervention prevent cision can be attained through continuous learning from the
the mortality and morbidity associated with these disorders. data of confirmatory tests such as plasma amino acids and
urine organic acids [Dietzen et al., 2009]. Certain analytes
in TMS that are elevated in multiple metabolic disorders
where the differential diagnosis is possible only through
Shaik Mohammad Naushad
naushadsm@gmail.com; naushad@yodalifeline.in confirmatory testing or genomics.
Although there is no Government sponsored public
1
Department of Biochemical Genetics, YODA Lifeline health program for mandatory NBS in India, a few studies
Diagnostics Pvt. Ltd, Ameerpet, Hyderabad 500016, India

13
49 Page 2 of 10 G. Usha Rani et al.

have helped in identifying the high prevalence of treatable The number of participants in each subgroup based on age
metabolic disorders in India. The initial study was con- and gender were tabulated in Supplementary Table 1. The
ducted in the 1980s’ based on thin layer chromatography mean birth weight was 2.72 ± 0.71 Kg. Subsequently, 3000
(TLC) where 98,256 newborns were tested and 46 amino newborn samples referred from different hospitals were
acid disorders were identified with an incidence of 1 in screened for IEMs using TMS. Pre-analytical requirements
2136 for amino acidopathies (Rao et al., 1988). The first i.e., sample collection, storage, and transportation, and ana-
systematic NBS was conducted in Andhra Pradesh, where lytical requirements such as quality control, and reagent
20,000 newborns born in all major government hospitals stability were taken care of. Samples that did not pass pre-
of Hyderabad were tested for treatable IEMs. TLC and analytical and analytical criteria were excluded from the
High-Performance Liquid Chromatography (HPLC) were analysis. Whatman 903 grade filter paper was used for col-
used for amino acid analysis. This study showed an inci- lecting blood samples. The DBS samples were collected
dence of 1:3660 for amino acidopathies (Rama Devi et al. from the heel prick of babies and were collected after 48 h
2004). TMS-based NBS has been popular in the last decade. of birth following two feeds from the mother.
An initial study on 2550 clinically symptomatic children Urine samples were collected from patients with
showed a 3.2% positivity rate with 54% positive cases informed consent from their parents. Frozen urine samples
showing amino acidopathies, 41.6% cases with organic aci- or urine-soaked filter papers were recommended for sam-
demias, and 4.4% showing FAOD (Nagaraja et al., 2010). A ples received from the outstation.
population-based study representing 4946 newborns from
the rural population of Andhra Pradesh identified 5 positive 2.2 LC-MS/MS analysis
cases, giving an incidence of 1:1000. The identified disor-
ders are Carnitine uptake disease, Isovaleric acidemia, Glu- The derivatized amino acid and acylcarnitine reagent kit
taric aciduria type-I, and Glutaric aciduria type-II (Sahai et (ClinSpot® LC-MS/MS Complete Kit) was purchased from
al., 2011). Although TMS is being performed in many cor- Recipe (Part no. MS10000). This analytical method directly
porate hospitals currently, there are no comprehensive stud- determines 49 analytes of amino acids and acylcarnitines
ies with inclusion of confirmatory and genetic tests for a (free carnitine, 35 acylcarnitines, and 13 amino acids), with-
better genotype-phenotype correlation. In the current study, out chromatographic separation on an HPLC column, via
we analyzed all screen-positive cases further using Gas tandem mass spectrometry (MS/MS) at a constant flow rate
Chromatography-Mass Spectrometry (GC-MS) for organic of 0.1 ml/min. The runtime was 1 min with the isocratic
acid analysis. Whole exome sequencing (WES) was per- program. The detection was performed on an LC-MS/MS
formed to identify the causative genetic defects based on the system (LCMS-8045, Shimadzu Co., Ltd., Japan), which
consent of the parents. Machine learning (ML) tools were was operated in positive ion mode (capillary voltage 2.0 kV,
applied to correlate metabolomics data with genomics data. desolvation temperature 526 °C). The internal standard is
Such correlation was explored for establishing differential used to calculate the analyte concentration of the sample.
diagnosis criteria for certain metabolic disorders. It would Quantification of analytes was achieved using peak areas
therefore be possible to initiate therapy without waiting for that were processed using Neonatal software.
the genetic report so that any damage to the organs can be
avoided. 2.3 GC/MS analysis
In the current study, we have calibrated the cut-offs of
different analytes in TMS and used ML tools to provide Urine organic acid analysis was performed as per the pub-
meaningful insights toward differential diagnosis through lished protocols (Tanaka et al., 1980; Kimura et al., 1999).
integrated OMICS. It minimizes false positives and nega- Hydroxylamine hydrochloride, margaric acid, hydrochloric
tives and guides toward a specific diagnosis. acid, ethyl acetate, sodium sulphate, and sodium hydrox-
ide were purchased from Merck. Hydrocarbon mixture
(C10-26) was from GL Sciences and urease, N, O-bis
2 Materials and methods (trimethylsilyl)-fluoroacetamide (BSTFA) with 1% trimeth-
ylchlorosilane (TMCS), and tropic acid were procured from
2.1 Study subjects Sigma Aldrich.
GC/MS system (Nexis GC-2030, GCMS-QP 2020 NX;
The baseline data were established on the basis of 572 Shimadzu Co., Ltd., Japan) with the capillary column was a
newborns (0–7 months) with 234 females and 338 males fused silica Rtx-5MS (30 m X 0.25 mm) with 0.25 μm film
who were healthy and had no clinical symptoms sugges- thickness of diphenyl dimethyl polysiloxane. Standard elec-
tive of metabolic disease during 3–4 months of follow-up. tron impact ionization scanning resulted in the Mass spectra

13
Application of machine learning tools and integrated OMICS for screening and diagnosis of inborn errors of… Page 3 of 10 49

in the range of m/z 35–600 at the rate of 0.4 s/cycle. The model. We used student t-test for analyzing the impact of
temperature program was started at 100 °C with initial hold- gender-differences in each analyte. One-way ANOVA was
ing for 4 min and was increased at the rate of 4 °C/minute to used to assess the changes in metabolites with growing age
290 °C, with final holding for 10 min. The temperatures of (from birth to 6–7 months of life). Computational platforms
the injection port and interphase were both at 280 °C. The like https://statpages.info/ and https://www.socscistatistics.
flow rate of the helium carrier was 1.5 ml/min, and the linear com were employed for the analysis.
velocity was 40.2 m/sec. A final derivatized aliquot of 1 µl
was injected into GC/MS in spitless mode. Data acquisi-
tion was performed in the scan mode using a mass-to-charge 3 Results
ratio. The run start time at 2.08 min and the end time was
60.86 min with the ion source temperature at 200 °C. 3.1 Establishing population-specific reference
ranges for tandem mass spectrometry
2.4 Whole exome sequencing
The data from 572 healthy newborns were used to estab-
WES was performed on the next-generation sequencing lish the population-specific reference ranges (1st percen-
platform using the MGI machine (DNBSEQ-G50). The tile − 99th percentile) of acylcarnitines and amino acids.
Twist Bioscience Kit was used for preparing libraries for The upper limits of each analyte in controls were consid-
WES. The concentration of each library was determined ered as the cut-off. Our cut-offs showed good correlation
using the QUBIT 2.0 Fluorometer. Samples were pooled (r2 = 0.906) with the cut-offs of the Center for Disease Con-
before sequencing with each sample at a final concentration trol and Prevention (CDC).
of 1.8 pM. Sequencing was performed on the DNBSEQ- Arginine, citrulline, and valine showed a positive corre-
G50 platform using 150 cycles, paired-end chemistry by lation with age. C2, C16, and C16:1 acylcarnitines showed
targeting 100x coverage. an inverse association with age while C8 and C4DC have a
positive association. No statistically significant gender dif-
2.5 Machine learning algorithm: classification and ferences were observed in the amino acid and acylcarnitine
regression (CART) model profiles. Except for C16 and C6DC, none of the analytes
showed statistically significant association with birth weight
Demographic and metabolic data were used as the input after Bonferroni’s corrections (Table 1).
variables and the diagnosis was used as the output to iden- Gender differences in the distribution of glutamic acid,
tify disease-specific thresholds of acylcarnitines and amino glycine, C5, C6DC, C16OH, C14:1, and C16:1OH were
acids. The CART model was built as proposed by Leo Brei- observed (Supplemental Table 2). Additionally, the refer-
man. This is a binary tree algorithm with the most important ence ranges significantly changed from 1 month to 6 months
determinant at the apex of the tree with subsequent branches of age in infancy, specifically for the analyte’s aspartic acid,
formed with other variables in descending order of impor- citrulline, glutamic acid, glycine, phenylalanine, C16, C18,
tance. Each root node was representative of a single input C4OH, C4DC, C16:1, and C18:2 in females (Supplemen-
variable. Branching was based on a specific threshold of tal Table 3) similarly arginine, aspartic acid, glutamic acid,
that variable that predict the outcome variable with reason- glycine, leucine, ornithine, phenylalanine, C5DC, C16,
ably. The extent of branching was optimized by pruning the C18, C4DC, C16OH, C14:1, C16:1, C18:2, and C18:1 were
tree to achieve the required prediction with minimal branch- affected by age in males (Supplemental Table 4). These dif-
ing. If multiple markers suggestive of a group of metabolic ferences can be attributed to altered metabolic rates with
disorders, differential diagnostic strategies were deduced age.
on the basis of these decision trees. The CART model was
validated using cases where molecular confirmation was 3.2 Incidence of IEMs among referred cases
available.
The screen-positive cases on TMS were further confirmed
2.6 Statistical analysis by urine organic acid analysis on GC/MS. The metabolite
elevations in the urine were consistent with the findings
The validity of each decision level of the CART model was observed on TMS. Among amino acid disorders, Tyrosin-
assessed on the basis of number of True Positives, True emia and Maple syrup urine disease are the most common,
Negatives, False Positives and False Negatives computed followed by hypermethioninemia, urea cycle disorder, and
in a 2 × 2 contingency table. Fisher exact test was performed phenylketonuria (PKU). Among organic acidurias, Propi-
for the calculation of the performance characteristics of the onic acidemia (PA) and Glutaric acidemia type-I (GA-1)

13
49 Page 4 of 10 G. Usha Rani et al.

Table 1 Indian-population spe- Analyte Min Max Correlation coefficient


cific cut-offs of amino acids and Age Male vs. female Birth weight
acylcarnitines and its correlation
with physiological variables Alanine 72.26 662.85 0.02 0.03 -0.05
Arginine 1.21 80.70 0.27* 0.07 -0.10
Aspartic acid 12.20 173.29 0.12 0.01 -0.10
Citrulline 4.12 60.53 0.28* -0.01 -0.09
Glutamic acid 113.30 594.93 -0.18* -0.03 0.02
Glycine 119.39 702.37 -0.12 -0.01 0.04
Leucine 50.50 388.27 -0.05 -0.04 0.09
Methionine 7.64 61.22 -0.01 -0.01 0.03
Ornithine 17.03 402.79 0.07 0.02 -0.10
Phenylalanine 20.28 118.65 0.05 -0.05 0.05
Proline 70.82 335.90 -0.04 0.01 0.02
Tyrosine 13.63 234.72 -0.08 -0.08 -0.02
Valine 36.10 318.57 0.16* 0.01 0.11
Free Carnitine (C0) 4.69 88.30 0.00 -0.01 -0.04
Acetyl Carnitine (C2) 3.61 96.95 -0.16* 0.05 0.08
Propionyl Carnitine (C3) 0.25 5.45 -0.03 0.03 0.12
Butyryl Carnitine (C4) 0.07 1.76 -0.01 -0.05 -0.03
Isovaleryl Carnitine (C5) 0.04 0.78 0.03 0.04 -0.07
Glutaryl Carnitine (C5DC) 0.01 0.46 -0.08 -0.07 0.04
Hexanoyl Carnitine (C6) 0.00 0.53 0.02 -0.03 -0.03
Octanoyl Carnitine (C8) 0.02 0.65 0.14* -0.03 0.00
Decanoyl Carnitine (C10) 0.02 0.50 -0.06 -0.01 -0.01
Dodecanoyl Carnitine (C12) 0.02 1.75 -0.05 0.04 -0.01
Tetradecanoyl Carnitine (C14) 0.03 0.75 -0.10 0.05 0.01
Palmitoyl Carnitine (C16) 0.35 7.5 -0.20* 0.01 0.14*
Stearoyl carnitine (C18) 0.15 3.13 -0.08 0.03 0.00
Malonyl Carnitine (C3DC) 0.01 0.25 -0.01 0.03 0.11
3-Hydroxybutyryl Carnitine (C4OH) 0.03 0.65 -0.09 0.01 0.05
Methylmalonyl Carnitine (C4DC) 0.06 0.84 0.25* -0.01 0.01
3-Hydroxyisovaleryl Carnitine (C5OH) 0.03 0.53 0.06 0.05 0.04
Methylglutarylcarnitine (C6DC) 0.00 0.11 -0.10 -0.04 0.14*
Octenoylcarnitine (C8:1) 0.01 0.37 0.04 0.04 -0.09
3-Hydroxypalmitoyl Carnitine (C16OH) 0.00 0.13 -0.07 0.03 0.00
3-Hydroxystearoyl carnitine (C18OH) 0.00 0.09 -0.04 0.00 -0.01
3-methylcrotonylcarnitine (C5:1) 0.01 0.26 -0.04 -0.02 0.04
Decanoyl Carnitine (C10:1) 0.01 0.26 0.07 -0.02 -0.03
Decadienoyl carnitine (C10:2) 0.00 0.08 -0.04 0.07 -0.05
Dodecenoylcarnitine (C12:1) 0.03 0.78 -0.10 -0.05 0.09
Tetradecadienoylcarnitine (C14:2) 0.00 0.62 0.06 0.03 -0.10
Tetradecenoyl Carnitine (C14:1) 0.01 0.86 -0.09 -0.01 0.04
3-OH-Tetradecenoylcarnitine (C14OH) 0.00 0.14 -0.10 0.01 0.02
Hexadecanoylcarnitine (C16:1) 0.03 0.94 -0.18* -0.03 0.10
3-Hydroxypalmitoleylcarnitine (C16:1OH) 0.01 0.23 0.03 0.02 -0.03
Octadecadienoylcarnitine (C18:2) 0.03 2.28 0.04 0.01 -0.11
3-Hydroxylinoleoylcarnitine (C18:2OH) 0.00 1.60 -0.02 -0.11 0.01
* p-values are statistically
3-OH-Octadecenoylcarnitine (C18:1OH) 0.01 0.15 -0.03 0.01 -0.04
significant after Bonferroni cor-
rection Linoleoyl Carnitine (C18:1) 0.29 3.29 -0.08 -0.01 -0.02

are the most common accompanied by Methylmalonic aci- deficiency, Long-chain 3-hydroxyacyl-CoA dehydrogenase
demia (MMA), beta-keto thiolase deficiency, Isovaleric (LCHAD) deficiency, and Short-chain 3-hydroxyacyl-CoA
acidemia (IVA), Carnitine update disease, and 3-hydroxy- dehydrogenase (SCHAD) deficiency. Among Urea cycle
3-methylglutaryl-CoA lyase deficiency (HMG-CoA lyase disorders, Argininosuccinate synthetase 1 (ASSI) deficiency
deficiency). Among FAODs, MCAD is the most com- is the most common preceded by Argininemia (Table 2).
mon followed by Carnitine palmitoyltransferase I (CPT I)

13
Application of machine learning tools and integrated OMICS for screening and diagnosis of inborn errors of… Page 5 of 10 49

Table 2 Incidence rate of inborn errors of metabolism (IEMs) showed 87.5% accuracy in diagnosing B12 deficiency and
(n = 3000)
100% accuracy in segregating the data into normal, MMA,
Disorder No. of Inci-
confirmed dence or PA with overall precision of 98.68% (Fig. 1). In GA-1
cases and IVA patients, the cut-off value of C5DC and C5 were
Amino acid disorders > 0.6 µmol/L and > 2.18 µmol/L, respectively.
Phenylketonuria 2 1:1500 The most frequent amino acid abnormality in Indian new-
Maple syrup urine disease (MSUD) 5 1:600 borns is tyrosine elevation. We have established thresholds
Argininemia 4 1:750 of tyrosine and methionine that can differentiate transient
Citrullinemia 2 1:1500 Tyrosinemia from classical Tyrosinemia type I (Phi coef-
Hypermethioninemia 4 1:750
ficient: 1.00). Tyrosine > 235 µmol/L and methionine > 34
Tyrosinemia 7 1:430
µmol/L indicate Tyrosinemia type I (Fig. 2). Tyrosine lev-
Organic acid disorders
els between 167 and 234.72 µmol/L and methionine levels
Carnitine uptake defect (CUD) 2 1:1500
Propionic aciduria (PA) 8 1:360
between 17 and 61.2 µmol/L indicate transient tyrosinemia.
Methylmalonic aciduria (MMA) 6 1:500
Isovaleric acidemia (IVA) 2 1:1500 3.4 Differential diagnosis of Urea Cycle Disorders
Beta keto thiolase (BKT) 5 1:600 (UCDs)
Glutaric aciduria type-1 (GA-1) 8 1:360
3-hydroxy-3-methylglutaryl-CoA lyase (HMG- 2 1:1500 During our study, we identified 28 UCDs with molecular
CoA lyase) deficiency confirmation. Two cases with low levels of citrulline and
Fatty acid oxidation disorders arginine with orotic aciduria were diagnosed to have Orni-
Medium-chain acyl-CoA dehydrogenase 9 1:330
thine transcarbamylase deficiency (OTC c.533 C > T, OTC
(MCAD) deficiency
Carnitine palmitoyltransferase I (CPT I) 3 1:1000
c.275G > A). Two cases with normal urine organic acid
deficiency profile had NAGS (c.702-4delA) and CPS1 (c.446G > A)
Long-chain 3-hydroxyacyl-CoA dehydroge- 1 1:3000 deficiencies. The most frequent UCD is citrullinemia.
nase (LCHAD) deficiency Citrulline > 417.5 µmol/L indicated ASS1 deficiency
Short-chain acyl-CoA dehydrogenase (SCAD) 1 1:3000 (c.1088G > A, c.1168G > A). Citrulline < 417.5 µmol/L and
deficiency
glutamine can differentiate between ASS1 (n = 11) and ASL
Urea Cycle Disorders
(n = 4) where ASL showed glutamine > 376.95 µmol/L. In
N-acetylglutamate synthetase (NAGS) 1 1:3000
deficiency Argininemia (n = 9), the disease-specific threshold for Argi-
Carbamoyl phosphate synthetase I (CPS1) 1 1:3000 nine was > 195.7 µmol/L (ARG1 c.899 C > G).
deficiency
Ornithine transcarbamylase (OTC) deficiency 2 1:1500 3.5 Integration of OMICS in the differential
Argininosuccinate synthetase 1 (ASSI) 11 1:272 diagnosis of IEMs by ML
deficiency
Argininosuccinate lyase deficiency 4 1:750
Integration of demographic data with metabolomics and
Argininemia 9 1:330
genomics using ML has given subtype-specific thresh-
Total 99 1:30
olds for further differentiation of MMA cases. If the age
of onset is > 27 months, the most likely MMA subtype is
3.3 Utility of ML in the differential diagnosis of IEMs Cb1C deficiency. If the age of onset is < 9 months and the
C3/C2 ratio < 10.35, most likely to be MUT deficiency. C3/
C3 deficiency is the most common abnormality observed in C2 > 10.35 results in MUT or Cb1B deficiency. If the age
Indian children. Through ML, we have identified different of onset is between 9 and 27 months, MMA elevation < 221
disease-specific thresholds for C3 and C3/C2 ratio that helps folds suggests Cb1A deficiency. MMA elevation is > 221
in distinguishing normal children from those with vitamin folds and C3/C2 ratio in between 11.89 and 12.22 suggests
B12 deficiency and MMA or PA (Phi coefficient = 0.93). of Cb1A deficiency. In MUT deficiency, C3/C2 ratio > 12.33
C3 < 5.22 µmol/L and C3/C2 ratio in the range of 0.38–0.68 with significant elevation of MMA. This prediction model
suggests vitamin B12 deficiency. Patients with MMA have showed 100% specificity and sensitivity in the differential
a C3 > 5.22 ratio and a C3/C2 ratio between 0.68 and 3.2. diagnosis of MMA (Fig. 3). The sensitivity and specificity
In PA patients, the level of C3 is > 7.38 µmol/L and the C3/ of the standard procedure used in our laboratory were 95.8%
C2 ratio is 0.72–4.29. TMS analysis alone cannot differenti- and 98.7%, respectively. ML-based approach improved the
ate MMA and PA. However, urine organic acid analysis by sensitivity to 98.2% and 99.8%, respectively.
GC/MS will help in the differential diagnosis. These ML

13
49 Page 6 of 10 G. Usha Rani et al.

Fig. 1 Utility of C3 and C3/C2 ratio in the differential diagnosis of B12 deficiency, methylmalonic acidemia, and propionic acidemia

3.6 Pathway modelling through the integration of 4 Discussion


genomics and metabolomics
The cut-offs established by the current study showed good
We have used the SNAP2 scores of nine Tyrosinemia correlation (r2 = 0.846) with the cut-offs derived from rural
type I cases showed the following FAH mutations namely newborn data from India (Sahai et al., 2011). Similarly, the
c.192G > T (n = 3), c.941 T > C (n = 1), c.983 A > G (n = 1), cut-offs showed good correlation the CDC cut-offs also (r2:
c.998delA (n = 1), c.1159G > A (n = 1), c.1211G > A (n = 1) 90.6%). This is the first study from India demonstrating
and c.709 C > T (n = 1) as an index of pathogenicity. We physiological differences in amino acids and acylcarnitines
developed a multiple linear regression model with these based on gender and age. Our results are consistent with
pathogenicity scores as output and various metabolites of the findings of a Canadian study that reported significant
the tyrosine metabolic pathway, namely plasma tyrosine, physiological changes in 50% of amino acids and 70% of
urinary succinyl acetone, urinary 4-hydroxy phenyl pyru- acylcarnitines during the neonatal period (Teodoro-Morri-
vic acid (PHPPA), and urinary 4-hydroxy phenyl lactic son et al., 2015). NBS in Southwest Colombia also reported
acid (PHPLA) as input variables. The deduced equation is age-related differences in amino acids and acylcarnitines,
as follows: whereas no significant differences by gender were observed
(Cespedes et al. 2017). Our results are consistent with the
FAH SNAP2 score = 2.6107+ (0.1362 × Tyrosine) study by Hammarqvist et al., 2010 that demonstrated a posi-
+ (0.0005 × Succinyl acetone) tive association of branched-chain and basic amino acids
+ (0.0426 × PHPLA) + (0.0133 × PHPPA) with increasing age, highlighting the requirement for these
essential amino acids during growth (Hammarqvist et al.,
This equation explained 91.94% variability in the pathoge- 2010). Furthermore, an association of acylcarnitine profile
nicity score based on the metabolomic profile (Fig. 4). with age is also corroborated with the findings of Cavedon
et al., 2005 in demonstrating a significant lowering of sev-
eral acylcarnitines with age. Similar to the findings of Ces-
pedes et al., 2017, no gender differences were observed in

13
Application of machine learning tools and integrated OMICS for screening and diagnosis of inborn errors of… Page 7 of 10 49

Fig. 2 Utility of tyrosine and methionine levels in distinguishing tran- nine Tyrosinemia type I cases showed the following FAH mutations
sient tyrosinemia from tyrosinemia type 1 namely c.192G > T, c.941T > C, c.983 A > G, c.998delA, c.1159G > A,
Tyrosine (Tyr) and Methionine (Meth) levels were able to distinguish c.1211G > A and c.709 C > T. Three cases exhibited c.192G > T
transient tyrosinemia from tyrosinemia type 1. WES analysis revealed mutation.

the amino acid and acylcarnitine profile. The positivity rate al., 2011). Phenylalanine hydroxylase deficiency, MCAD
in referred cases in our study was 2.37%, while the positiv- deficiency, and methylcrotonyl CoA carboxylase (3-MCC)
ity rate was 1.4% in another large-scale study (Babu et al., deficiency were detected with > 95.2% and false-positive
2015). This study is in agreement with our study in iden- rate < 0.001% with machine learning (Baumgartner et al.,
tifying methylmalonic aciduria, glutaric acidemia type 1, 2005).
propionic aciduria, maple syrup urine disease, phenylketon- We have used ML tools to reduce the false positivity
uria, and tyrosinemia as the most frequent IEMs (Hampe et rate in NBS by defining disease-specific thresholds, and by
al., 2017). Only two analytes showed association with birth identifying primary and secondary markers for differential
weight i.e., C16 and C6DC. However, none of these mark- diagnosis. A recent study also used a similar approach with
ers are individually specific for any metabolic disorder. C16 a Random Forest machine learning classifier that minimized
elevation along with C18, C18:1 and C18:2 is an indicator the number of false positives for GA-1 by 89%, for MMA
of CPT II deficiency or carnitine/acylcartinine translocase by 45%, for OTCD by 98%, and for VLCAD deficiency by
deficiency. In view of both primary and secondary markers, 2% (Peng et al., 2020). ML models based on diagnostic cut-
the diagnosis will not be affected by birth weight. offs reduced the number of false positive cases from 21 to
Classification and regression trees based on metabolite 2 for phenylketonuria, from 30 to 10 for hypermethionin-
levels and their mutual ratios have been used in newborn emia, and from 209 to 46 for 3-MCC deficiency (Chen et
screening for a long time. C8, C10, and C8/C2 were used for al., 2013).
detecting medium chain acyl CoA dehydrogenase (MCAD) The utility of integrating metabolomics data with genom-
deficiency in a study from Belgium (Van den Bulcke et ics has been investigated earlier by coupling untargeted

13
49 Page 8 of 10 G. Usha Rani et al.

Fig. 3 Integration of OMICs in


differential diagnosis of IEMs
with machine learning tools
Mutations in MUT, Cb1A, Cb1B,
and Cb1C have been identified
in established cases of MMA.
In all these cases, confirma-
tion data was available, which
formed the basis to develop a
classification and regression
model. Age of onset is the most
significant determinant with
Cb1C deficiency manifesting
after 27 months. MUT and Cb1B
mutations were early onset (< 9
months). If the age of onset is
between 9–27 months, MMA
levels and C3/C2 ratio can
differentiate MUT and Cb1A
deficiency. Cb1A deficiency is
associated with < 221 for MMA,
if elevation is > 221 C3/C2 ratio
will be between 11.89–12.33
in Cb1A where as in > 12.33 in
MUT. 1, 2, 3, and 4 represent
key decision thresholds. MUT:
p.A668P, p.R532H; p.A376Sfs*6,
p.R511*; Cb1A: p.Y24*,
c.562 + 1_562 + 2insT, Cb1B:
p.R191Q, p.D112*, p.R191W;
Cb1C: p.R132*

Fig. 4 Pathogenicity prediction


model for FAH gene mutations
using SNAP2 score as an index
of pathogenicity

metabolomics upstream or downstream to the primary reac- a ML model capable of differential diagnosis. This model
tion with in silico-simulated WES results to increase the could give possible clues about the subtypes of methylmalo-
diagnostic value (Kerkhofs et al., 2020). We have developed nic acidemia based on the age of onset, C3/C2 ratio, and uri-
a pathogenicity score prediction model using metabolomics nary methylmalonic acid content. This information will be
data in patients with tyrosinemia. In the current study, we of clinical utility in initiating the therapy as early as possible
have integrated TMS, GC-MS, and genomics data to derive as waiting for the molecular report may adversely affect the

13
Application of machine learning tools and integrated OMICS for screening and diagnosis of inborn errors of… Page 9 of 10 49

Informed consent Informed consent was obtained from each partici-


prognosis. A Korean study used a 259-gene targeted NGS pant in the study.
panel along with TMS to improve the diagnostic yield of
IEMs (Ko et al., 2018) and showed 100% concordance in Conflict of Interest The authors declare that they have no conflicts of
the results. interest.
The major strengths of the current study are (i) integra-
tion of NBS, metabolomics, and genomics to improve dif-
ferential diagnostic potential; (ii) physiological changes References
namely the influence of age, gender, and birth weight, etc.
on various analytes also evaluated. Integrated OMICS gave Babu, R. P., Bishnupriya, G., Thushara, P. K., Alap, C., Cariappa, R.,
Annapoorani, & Viswanathan, K. (2015). Detection of glutaric
more insights into differential markers both in screening acidemia type 1 in infants through tandem mass spectrometry.
and confirmation, assisting in a specific diagnosis and early Molecular genetics and metabolism reports, 3, 75–79.
intervention. Baumgartner, C., Böhm, C., & Baumgartner, D. (2005). Modelling
The incidence depicted in this study may not be repre- of classification rules on metabolic patterns including machine
learning and expert knowledge. Journal of biomedical informat-
sentative of population incidence as these subjects were ics, 38(2), 89–98.
referred from various hospitals and were not part of a man- Cavedon, C. T., Bourdoux, P., Mertens, K., Van Thi, H. V., Herremans,
datory NBS program. Hence, the sample size is limited N., de Laet, C., & Goyens, P. (2005). Age-related variations in
compared to population-based screening programs. acylcarnitine and free carnitine concentrations measured by tan-
dem mass spectrometry. Clinical chemistry, 51(4), 745–752.
Céspedes, N., Valencia, A., Echeverry, C. A., Arce-Plata, M. I., Colón,
C., Castiñeiras, D. E., Hurtado, P. M., Cocho, J. A., Herrera, S.,
5 Conclusions & Arévalo-Herrera, M. (2017). Reference values of amino acids,
acylcarnitines and succinylacetone by tandem mass spectrometry
for use in newborn screening in southwest Colombia. Colombia
To conclude, this study provided baseline data for acylcarni- medica (Cali, Colombia), 48(3), 113–119.
tines and amino acids in Indian newborns and explored the Chen, W. H., Hsieh, S. L., Hsu, K. P., Chen, H. P., Su, X. Y., Tseng, Y.
physiological changes attributed to age, gender, and birth J., Chien, Y. H., Hwu, W. L., & Lai, F. (2013). Web-based new-
weight. Integration of screening with confirmatory pan- born screening system for metabolic diseases: Machine learning
versus clinicians. Journal of medical Internet research, 15(5), e98.
els such as urine organic acid analysis and whole exome Dietzen, D. J., Rinaldo, P., Whitley, R. J., Rhead, W. J., Hannon, W. H.,
sequencing helped in establishing unique genotype-meta- Garg, U. C., Lo, S. F., & Bennett, M. J. (2009). National academy
bolic phenotype correlations, thus facilitating informed clin- of clinical biochemistry laboratory medicine practice guidelines:
ical decisions. Compared to conventional NBS programs, Follow-up testing for metabolic disease identified by expanded
newborn screening using tandem mass spectrometry; executive
ML-based integration offers high specificity and sensitivity summary. Clinical chemistry, 55(9), 1615–1626.
in detecting these IEMs. Hammarqvist, F., Angsten, G., Meurling, S., Andersson, K., & Werner-
man, J. (2010). Age-related changes of muscle and plasma amino
Supplementary Information The online version contains acids in healthy children. Amino acids, 39(2), 359–366.
supplementary material available at https://doi.org/10.1007/s11306- Hampe, M. H., Panaskar, S. N., Yadav, A. A., & Ingale, P. W. (2017).
023-02013-x. Gas chromatography/mass spectrometry-based urine metabo-
lome study in children for inborn errors of metabolism: An indian
Acknowledgements We would like to acknowledge all the families experience. Clinical biochemistry, 50(3), 121–126.
who participated in the study. We thank all the clinicians who have Kerkhofs, M., Haijes, H. A., Willemsen, A. M., van Gassen, K., van
referred samples to YODA Lifeline Diagnostics Pvt. Ltd., Hyderabad. der Ham, M., Gerrits, J., de Sain-van der Velden, M., Prinsen,
We thank management for providing the necessary infrastructure for H., van Deutekom, H., van Hasselt, P. M., Verhoeven-Duif, N.
the study. M., & Jans, J. (2020). Cross-Omics: Integrating Genomics with
Metabolomics in Clinical Diagnostics. Metabolites, 10(5), 206.
Kimura, M., Yamamoto, T., & Yamaguchi, S. (1999). Automated meta-
Author Contribution URG and SK participated in the study design, in-
bolic profiling and interpretation of GC/MS data for organic aci-
terpretation of data, statistical analysis, and drafting of the manuscript.
demia screening: A personal computer-based system. The Tohoku
BKR and SD were involved in carrying out the experiments. SMN car-
journal of experimental medicine, 188(4), 317–334.
ried out the conception and design of the study, interpretation of data,
Ko, J. M., Park, K. S., Kang, Y., Nam, S. H., Kim, Y., Park, I., Chae, H.
statistical analysis, and final approval of the manuscript.
W., Lee, S. M., Lee, K. A., & Kim, J. W. (2018). A New Integrated
Newborn Screening Workflow can provide a shortcut to Differen-
Declarations tial diagnosis and confirmation of inherited metabolic Diseases.
Yonsei medical journal, 59(5), 652–661.
Compliance with ethical standards All procedures performed in stud- Nagaraja, D., Mamatha, S. N., De, T., & Christopher, R. (2010).
ies involving human participants were in accordance with the ethical Screening for inborn errors of metabolism using automated elec-
standards of the institutional and/or national research committee. This trospray tandem mass spectrometry: Study in high-risk indian
was in accordance with the 1964 Helsinki declaration and its later population. Clinical biochemistry, 43(6), 581–588.
amendments or comparable ethical standards. Peng, G., Tang, Y., Cowan, T. M., Enns, G. M., Zhao, H., & Scharfe,
C. (2020). Reducing false-positive results in Newborn Screening

13
49 Page 10 of 10 G. Usha Rani et al.

using machine learning. International journal of neonatal screen- Van den Bulcke, T., Vanden Broucke, P., Van Hoof, V., Wouters, K.,
ing, 6(1), 16. Vanden Broucke, S., Smits, G., Smits, E., Proesmans, S., Van
Rama Devi, A. R., & Naushad, S. M. (2004). Newborn screening in Genechten, T., & Eyskens, F. (2011). Data mining methods
India. Indian journal of pediatrics, 71(2), 157–160. for classification of Medium-Chain Acyl-CoA dehydrogenase
Rao, N. A., Devi, A. R., Savithri, H. S., Rao, S. V., & Bittles, A. H. deficiency (MCADD) using non-derivatized tandem MS neo-
(1988). Neonatal screening for amino acidaemias in Karnataka, natal screening data. Journal of biomedical informatics, 44(2),
south India. Clinical genetics, 34(1), 60–63. 319–325. Wilcken, B., Wiley, V., Hammond, J., & Carpenter, K.
Sahai, I., Zytkowicz, T., Rao Kotthuri, S., Lakshmi Kotthuri, A., (2003). Screening newborns for inborn errors of metabolism by
Eaton, R. B., & Akella, R. R. (2011). Neonatal screening for tandem mass spectrometry. The New England journal of medi-
inborn errors of metabolism using tandem mass spectrometry: cine, 348(23), 2304–2312.
Experience of the pilot study in Andhra Pradesh, India. Indian
journal of pediatrics, 78(8), 953–960. Publisher’s Note Springer Nature remains neutral with regard to juris-
Tanaka, K., West-Dull, A., Hine, D. G., Lynn, T. B., & Lowe, T. dictional claims in published maps and institutional affiliations.
(1980). Gas-chromatographic method of analysis for urinary
organic acids. II. Description of the procedure, and its application Springer Nature or its licensor (e.g. a society or other partner) holds
to diagnosis of patients with organic acidurias. Clinical chemistry, exclusive rights to this article under a publishing agreement with the
26(13), 1847–1853. author(s) or other rightsholder(s); author self-archiving of the accepted
Teodoro-Morrison, T., Kyriakopoulou, L., Chen, Y. K., Raizman, J. E., manuscript version of this article is solely governed by the terms of
Bevilacqua, V., Chan, M. K., Wan, B., Yazdanpanah, M., Schulze, such publishing agreement and applicable law.
A., & Adeli, K. (2015). Dynamic biological changes in metabolic
disease biomarkers in childhood and adolescence: A CALIPER
study of healthy community children. Clinical biochemistry,
48(13–14), 828–836.

13

You might also like