Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Prediction of Idiopathic Recurrent Spontaneous

Miscarriage using Machine Learning


2023 International Conference on Computer, Electrical & Communication Engineering (ICCECE) | 978-1-6654-5251-9/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICCECE51049.2023.10085363

Dadoma Sherpa Rajwade Dhruva Abhijit Imon Mitra Dhruba Dhar


School of Medical Science and Department of School of Medical Science and School of Medical Science
Technology Biotechnology Technology and Technology
Indian Institute of Technology Indian Institute of Indian Institute of Technology Indian Institute of Technology
Kharagpur Technology Kharagpur Kharagpur Kharagpur
Kharagpur, India Kharagpur, India Kharagpur, India Kharagpur, India
domasherpa27@gmail.com rajwadedhruva@gmail.com imonmitra.sp@gmail.com dhruba.dhar2009@gmail.com

Sunita Sharma Pratip Chakraborty Koel Chaudhury*


Institute of Reproductive Institute of Reproductive School of Medical Science and
Medicine, Kolkata Medicine, Kolkata *Corresponding author Technology
sunitapalchaudhuri@yahoo.com pratip_2011@yahoo.com Indian Institute of Technology
Kharagpur
Kharagpur, India
koel@smst.iitkgp.ac.in

Abstract— Recurrent spontaneous miscarriage (RSM) is spectroscopy combined with ML algorithm may facilitate a
defined as the spontaneous loss of two or more clinically better understanding of this pathology.
diagnosed pregnancies within 20 weeks of gestation. Despite
extensive research, etiology remains undefined in 50% of RSM Keywords— IRSM, Raman spectroscopy, machine learning, SVM,
cases, and are classified as idiopathic. Thus, further study is DT, XGBoost, CNN, AdaBoost, RF, GB, ANN, PCA, OPLS-DA
warranted to understand molecular mechanism associated with
I. INTRODUCTION
the disease pathogenesis. In the present study, we aim to
identify Raman fingerprints in endometrial/uterine tissues of Recurrent spontaneous miscarriage (RSM) is defined as the
women with history of idiopathic recurrent spontaneous spontaneous loss of two or more clinically diagnosed
miscarriage (IRSM) and controls by performing Raman pregnancies within 20 weeks of gestation [1]. Various causative
spectroscopy with chemometric analysis and spectral factors such as endocrine, genetic, immunological, anatomic,
classification models. Unsupervised analysis such as principal infectious, thrombophilic, environmental, and metabolic are
component analysis (PCA), hierarchical cluster analysis (HCA) known to be responsible for RSM [2]. Despite extensive
and supervised analysis such as orthogonal projections to latent research in the past few decades, there is no underlying cause
structures discriminant analysis (OPLS-DA) showed a distinct and half of the patients will remain without a diagnosis. The
separation between IRSM and controls. The principal condition of these patients is known as idiopathic. Thus, further
component loading plots indicated that proteins, amino acids, research is warranted to understand the mechanism of
cholesterol and glutamate were responsible for the separation idiopathic recurrent spontaneous miscarriage (IRSM).
between the two groups. The pre-processed Raman spectral Recently, there has been considerable interest to explore the
data were subjected to eight different machine learning (ML) pathophysiology of complex diseases using powerful label-free
classifiers with hyperparameter optimization to develop vibrational techniques [3-5]. Raman spectroscopy, based on
prediction models. Comparing the various algorithms, support inelastic scattering of light that gives information about
vector machine (SVM), decision tree (DT), Extreme Gradient vibrational modes of molecular bonds, is one such technique
Boosting (XGBoost), convolutional neural network (CNN), which is totally non-invasive in nature and is increasingly
and artificial neural network (ANN) outperform the other gaining popularity to obtain an insight into the characteristics
models based on accuracy (< 85%). Next, grid search and of proteins, carbohydrates, lipids and nucleic acids [6-7].
Bayesian optimization was used for tuning the hyperparameters Various diseases lead to structural and chemical changes in
of all methods. Further, 10-fold cross-validation was done to tissues at a molecular level which are reflected through intrinsic
validate the model performances. Raman fingerprints and can be effectively used as sensitive
phenotypic markers of the disease.
The present findings confirm the feasibility of using Raman

Authorized licensed use limited to: SAINT LOUIS UNIVERSITY LIBRARIES. Downloaded on May 02,2024 at 08:19:13 UTC from IEEE Xplore. Restrictions apply.
Recently, the development of artificial intelligence Fig. 1 Workflow of the study plan
(AI) has emerged rapidly into healthcare; it has established
marked potential in disease diagnostics and treatment. Machine Data processing
learning (ML) is a subset of AI and is currently being explored
in various areas of reproductive health. ML has been used for All spectra obtained was smoothed using Savitzky–Golay
the development of models for prediction of endometriosis, method and baseline corrected to remove noise and
polycystic ovary syndrome (PCOS) and preeclampsia [8-10]. fluorescence background using ORIGIN PRO 8.5 (Origin Lab
ML algorithms help to interpret rapidly and efficiently the vast Corporation, Northampton, MA, USA). The band position of
amount of complex biomedical data in terms of valuable feature each biomolecules was determined using the peak analyzer
information. Therefore, ML is recognized as a promising algorithm of Origin Pro software (Northampton, MA, USA).
method for the identification of patterns and has been Data preprocessing enabled clear spectral peaks and improved
extensively applied to understand disease mechanisms, spectral quality considerably. The Raman peaks were assigned
reducing inevitable errors in diagnosis, treatment and clinical based on earlier published articles and literature [13-15]. The
decision-making [11-12]. According to our knowledge, this is processed Raman data were further taken for chemometric
the first study where Raman spectroscopy combined with ML analysis.
is used to understand disease mechanism of IRSM, a common
women’s health issue. We aim to identify the best performing Chemometric analysis
algorithm by hyperparameter tuning and develop predictive
models for understanding the disease pathophysiology of Principal component analysis (PCA) was conducted to gain
IRSM. information related to the spectral differences between the type
of samples. Moreover, to obtain information about similarity
between the samples, hierarchical cluster analysis (HCA) was
II. METHODOLOGY performed with Euclidean distance matrix.
This study employed recruitment of women with history of
IRSM (n=15) and controls (n=15) as per inclusion criteria at the Euclidean Distance (x, y)= ∑ni=1(yi-xi)2  
Assisted Reproduction Unit, Institute of Reproductive HCA allows visualization of the overall grouping and
Medicine, Salt Lake, Kolkata. Endometrial tissue samples
accordingly sub-groups the spectra based on their similarities
(lining of the uterine cavity where the embryo implants) were
collected from both the groups during a favourable period of [16]. HCA and PCA were performed using the ChemoSpec
the menstrual cycle, known as the window of implantation package of R. We also obtain the weight (loading plot) from
(WOI). All tissue samples were fixed in 4% formalin and linear transformation, which contains new variables, and
embedded in paraffin for Raman spectroscopy. All spectra were indicates the molecular vibrational modes that explained most
acquired in the spectral range of 400-1800 cm−1 (the fingerprint of the variance. Supervised classification models such as
region) and intensity measured. This region provides complete OPLS-DA (orthogonal partial least squares discriminant
information about the biomolecules such as lipids, analysis) was applied to visualize class separation using
carbohydrates, proteins and nucleic acids. Further, Raman SIMCA 13.0.1 (Umetrics, Sweden). OPLS-DA augment class
spectral data were evaluated by ML algorithms including segregation by eliminating variability that is not relevant to
Extreme Gradient Boosting (XGBoost), decision tree (DT), class separation.
support vector machine (SVM), Adaptive Boosting
(AdaBoost), random forest (RF), gradient boosting (GB)
convolutional neural network (CNN) and artificial neural
network (ANN). The overall workflow of the study design is Machine learning algorithms
shown in Figure 1.
Following spectral preprocessing, Raman spectral data was
subjected to ML in python using scikit-learn library. The
spectral data were divided randomly into training set and
validation set in a ratio of 70:30. The training set was used to
train a classification model and the validation set used to
evaluate the model performance.
Several ML algorithms, including SVM, AdaBoost, XGBoost,
DT, RF, GB, CNN and ANN were used to build the prediction
models for IRSM. We also implemented neural networks to
improve upon ML methodologies. The performance of each
classifier algorithm was measured independently.

A. Support Vector Machine

SVM is a classifier which divides the datasets by defining a


hyperplane, known as a maximum margin hyperplane. This

Authorized licensed use limited to: SAINT LOUIS UNIVERSITY LIBRARIES. Downloaded on May 02,2024 at 08:19:13 UTC from IEEE Xplore. Restrictions apply.
hyperplane minimizes the distance between the nearest points converted to a single dimension using the flatten layer. The
in each class [17]. This algorithm achieves high discriminative output layers consisted of a dense layer, and a sigmoid
power by using special nonlinear functions (polynomial activation function, which added the final classification
kernels) to transform the input space into a multidimensional label(predicted) to the output. The flowchart of the CNN model
space. SVM has the advantages of increasing separation of class is shown in Figure 2.
and reducing predictable error. Additionally, SVM is used for
both non-linear and linear discriminatory analyses. The
equation of polynomial kernels is given as
K (x, y) = tanh (ϒ.xT y + r)d, ϒ > 0 (2)

B. Adaptive Boosting

AdaBoost aims to merged many weak learners into a strong


learner to improve the performance of the predictive model
[18]. AdaBoost repeatedly calls a given weak learning
algorithm, where at each step the weights of inaccurately
classified instances are increased such that the weak learner
focusses more on difficult cases. In simple words, weak
learners are transformed into strong ones.

C. Extreme Gradient Boosting

XGBoost is a gradient boosting algorithm that combines the


regression tree by sequentially adding predictors and correcting
Fig. 2 Architecture of CNN
the previous models [19].

D. Decision Tree
H. Artificial Neural Network
DT is a powerful ML model that builds a decision tree from the
set of class labeled training samples [20]. The tree can be An artificial neural network was implemented using the
explained by two entities, namely leaves and decision nodes. PyTorch library. Initially, linear layers were added, which takes
The leaves are the decisions or the final outcomes, and the an input as input dimension, and the number of hidden
decision nodes are where the data is split. dimensions (128), which are the output unit of layers. A ReLU
activation function was added next, and another linear layers
E. Random Forest were added. Further, sigmoid activation function was added for
the final output layer. A forward step function was defined,
RF is an ensemble method that operates by construction of which activates a forward pass of the network. Next, a
multitude of decision trees by random selection of features with backpropagation pass was used to update the weights using
controlled variance [21]. Adam optimizer. Further the loss was calculated using the
output of forward pass and backpropagation was initialized. An
F. Gradient Boosting
optimizer. step() function was called, which updates the
GB is an ensemble technique which creates the model model’s parameters using the gradients that were calculated
consisting of weaker prediction models, mostly decision trees during the backpropagation step, and defined optimizer with a
and weak basic classifiers gradually become better than learning rate α=0.01. The architecture of ANN model is shown
previous weak classifiers [22]. in Figure 4.

G. Convolutional Neural Network


CNN is a class of neural networks that consists of several The equation for ReLU activation function is given as
hidden layers performing convolution to extract high level of R (z) = max (0, z) (3)
features of input data [23]. The networks consist of convolution
layers, pooling layers, and full connection layers and was
implemented using TensorFlow library. Initially, convolution
block was added to the input with the rectified linear unit The equation for sigmoid activation function is given as
(ReLU) activation function. A dense layer and a global max 1
pooling step were further added. The output of this layer was σ(z) = (4)
1+ e-z

Authorized licensed use limited to: SAINT LOUIS UNIVERSITY LIBRARIES. Downloaded on May 02,2024 at 08:19:13 UTC from IEEE Xplore. Restrictions apply.
were cross validated using GridSearchCV; separate cross
validation for each model was also conducted for additional
verification. After tuning the hyperparameters, K-fold (CV)
(k=10) was performed for each set of hyperparameter values and
the performance measured by assessing the area under receiver
operating characteristic (ROC) curve (AUC). The
hyperparameters that showed the highest accuracy were chosen
further for all ML algorithms.
Fig. 3 (A) Graph of ReLU function (B) Sigmoid function
Cross -Validation

K-fold CV was used to validate model performances. We


applied the 10-fold CV method to the dataset. In this approach,
each sample is used the same number of times for training and
the metrices in 10 iterations. The performance of each metrices
such as sensitivity, accuracy, and specificity, where TN, TP, FN
and FP, represent the number of true negatives, true positives,
false negatives and false positives, respectively.
Accuracy = TP +TN/ TP +TN + FP + FN (5)
Sensitivity = TP/TP + FN (6)
Specificity = TN/ TN + FP (7)

In order to access model performance, area under ROC curves


and F1 scores were also generated. AUC close to 1 indicates
that the model has good predictive performance. In addition,
learning curves were also generated which are diagnostic tools
that assess the algorithm performances, with different sizes of
Fig. 4 Architecture of ANN
training data. The curves also indicate changes in learning
performances and can effectively identify overfitting,
underfitting and good fitting in models [25]. We obtained
Hyperparameter Tuning learning curves from seven different algorithms to diagnose the
ML model performances.
When building ML models, the hyperparameters must be tuned
and fixed to defined values to improve the model performances.
Herein, for optimizing the parameters and to improve model
performance, grid search method was implemented in python
using the scikit-learn package [24]. This method searches III. RESULTS AND DISCUSSION
through different values of model hyperparameters and selects
a single subset that provides the best performance on a given
dataset and is used to configure a model. Raman spectra were generated using endometrial tissues of
women with and without history of IRSM and the vibrational
Four key parameters, i.e. C, degree, gamma and kernel were modes of biomolecules in the spectral range of 400-1800cm-1
used for SVM. For AdaBoost, parameters such as algorithm, were analyzed. A significant peak shift originating from amide
learning_rate, and n_estimators were used for tuning purposes. I, amide III, CH2 bending mode of proteins and lipids, glutamate
We have tuned four parameters, including colsample_bytree, and cholesterol were observed in IRSM when compared with
gamma, max_depth, min_child_weight for XGBoost. For DT, controls. Moreover, the intensity of the assigned peaks were
the parameters max_depth, max_features, ccp_alpha and observed to be reduced in the IRSM group. The assigned Raman
criterion were tuned. For RT, four parameters, i.e. criterion, peaks and their respective shifts are presented in Table 1. The
max_depth, max_features and n_estimators were used. differences in the intensity of these characteristic peaks are
Parameters such as criterion, learning_rate, loss, max_features responsible for subsequent discrimination between the two
were tuned for GB. Bayesian optimization (Keras tuner) was groups. Further, chemometric analysis was applied to classify
used for hyperparameter tuning in the layers for CNN and ANN. the spectral datasets and identify the biochemical differences
between IRSM and controls.
A grid search also includes an inbuilt K-fold cross validation
(CV) loop which cross validates the best fit model. All models

Authorized licensed use limited to: SAINT LOUIS UNIVERSITY LIBRARIES. Downloaded on May 02,2024 at 08:19:13 UTC from IEEE Xplore. Restrictions apply.
Table 1. Raman shifts and assignment of peak The study was carried a step further to examine the performance
S. No IRSM Controls Vibrations
of the ML algorithms on Raman spectral data. For this purpose,
eight different models i.e. XGBoost, DT, SVM, AdaBoost, RF
1. 1650 1665 Amide I
GB, CNN and ANN were developed. To improve the model
2. 1446 1451 CH2 bending mode of proteins and
lipids performances, grid-search method and Bayesian optimization
3. 1254 1249 Amide III was used to perform hyperparameter tuning of all classifiers.
4. 1090 1096 Glutamate The best parameters selected with fixed defined values are
5. 550 548 Cholesterol shown in Figure 7 and Table 2.

After tuning the hyperparameters, K-fold cross-validation (CV)


(k=10) was performed for each set of hyperparameter values
Further, PCA was performed to detect differences between the
two groups (IRSM vs. controls). PCA analysis demonstrated and AUC performance measured. SVM, XGBoost, DT, CNN
that Raman spectra of the IRSM group differed from that of the and ANN were found to outperform all other models based on
control group. Also, PCA plot demonstrated considerable accuracy and AUC values.
overlapping; this may be attributed to intra-class variability due Table 2. Parameters of different classifiers
to the use of patient samples. We also conducted an analysis of Models Selected values of parameters
variance (ANOVA) to examine the variance by setting a Support vector C = 10, degree = 3, gamma = 0.001, kernel =
threshold of 0.05 for the chi-square test. However, the variance machine linear
was found to be statistically insignificant (p value < 0.05). Random forest criterion = gini, max_depth = 8, max_features =
sqrt, n_estimators = 500
Further, OPLS-DA analysis showed optimized class separation Adaptive Algorithm = SAMME.R, learning_rate = 0.97,
between IRSM and controls ((R2Y= 0.992 and Q2=0.899). Boosting n_estimators = 2
Furthermore, HCA analysis of almost all Raman spectra of the Decision tree ccp_alpha = 0.1, criterion = entropy, max_depth =
control group exhibited a distinct similarity, while spectra of 8, max_features = log2
Extreme Gradient colsample_bytree = 1, gamma = 2, max_depth =
IRSM tissues showed similarity between themselves, thereby Boosting 5, min_child _weight = 1
clearly forming two different clusters. The loading plots of PC1 Gradient boosting criterion = Friedman_msc, learning _rate = 0.5,
and PC2 represent the spectral bands that contribute most to the loss = logloss, max_features = sqrt
variance described by the principal component (Figure 5). The
strong contribution was observed from peaks 1665 cm-1, 1451
cm-1, 1245 cm-1, 1096 cm-1 and 548 cm-1. These results indicate The confusion matrix of the classifiers SVM, XGBoost, DT,
that amide I, CH2 bending mode of proteins and lipids, amide and CNN were also generated. Since confusion matrices are a
III, cholesterol and glutamate are responsible for separation useful way to assess the classifier, each row represents the
between IRSM and controls. actual target values while each column displays those values
predicted by the ML models. The confusion matrix of
classifiers are shown in Figure 6.

Fig. 5 (A) Principal component analysis (PCA) shows separation between


IRSM and controls (B) Loading plot of PC1 and PC2 show distinct peak Fig.6 Confusion matrix of (A) support vector machine (SVM) (B) Extreme
separation between IRSM and controls (C) OPLS-DA models shows optimized Gradient Boosting (XGBoost) (C) decision tree (DT) and (D) convolutional
separation between the two groups (D) Hierarchical cluster analysis (HCA) of neural network (CNN)
Raman spectra of endometrial tissue obtained from women with IRSM (n=15,
red) and controls (n=15, black)

Authorized licensed use limited to: SAINT LOUIS UNIVERSITY LIBRARIES. Downloaded on May 02,2024 at 08:19:13 UTC from IEEE Xplore. Restrictions apply.
Fig. 7 Hyperparameter tuning of classifiers (A) Support vector machine (SVM) (B) Random forest (RF) (C) Adaptive Boosting (AdaBoost) (D)
Decision tree (DT) (E) Extreme Gradient Boosting (XGBoost) (F) Gradient boosting (GB)

The performance of the applied classification models was Fig. 8 Receiver operating characteristic (ROC) curve of different
determined in terms of sensitivity, specificity, accuracy, F1 models (A) Support vector machine (SVM) (B) Extreme Gradient
Boosting (XGBoost) and (C) Decision tree (DT) (D) Convolutional
scores and AUC values. Sensitivity and specificity are neural network (CNN)
measures of classification success in predicting the presence or
absence of a disease. Additionally, accuracy is the most
Table 3. Comparison of machine learning algorithms
common performance metric for classification algorithms. F1
score specifies the harmonic mean of sensitivity and precision.
Models Sensitivity Specificity Accurac AUC F1
Further, K-fold CV (k=10) was used to validate model y score
performances using grid search. Comparing all the algorithms, SVM 100% 81% 90% 0.91 90%
SVM, XGBoost, DT classifiers demonstrated best performance XGBoost 88% 81% 85% 0.85 84%
and achieved the classification accuracy of 90% 85%, and 85% DT 88% 81% 85% 0.85 84%
respectively. (Table 3). The proposed neural network CNN and CNN 100% 82% 90% 0.91 90%
ANN 100% 88% 92% 0.94 89%
ANN model also showed very good results in terms of
accuracy. The 1D CNN, showed a mean accuracy of 90% while
ANN model attained a mean accuracy of 92%. The ROC curves
of classifiers are presented in Figure 8.
In addition, learning curves were also generated to measure the
algorithm learning performances with different training data
sizes over time. The learning curves of classifiers are shown in
Figure 9.

Authorized licensed use limited to: SAINT LOUIS UNIVERSITY LIBRARIES. Downloaded on May 02,2024 at 08:19:13 UTC from IEEE Xplore. Restrictions apply.
macromolecules such as lipids, glutamate, proteins etc. was
observed in the endometrial tissue microenvironment of IRSM
cases. It is concluded that Raman spectroscopy in conjunction
with ML approach helped identify biomolecular fingerprints in
endometrium of women with history of IRSM. This approach
paves way for better understanding of the endometrial
impairment at a molecular level in women undergoing IRSM.

ACKNOWLEDGMENT
The authors thankfully acknowledge the financial support
provided by MHRD, India, from Indian Institute of Technology,
Kharagpur.

REFERENCES
[1] Eshre Guideline Group on RPL, Bender Atik R, Christiansen
OB, Elson J, Kolte AM, Lewis S, Middeldorp S, Nelen W,
Peramo B, Quenby S, Vermeulen N, “ESHRE guideline:
recurrent pregnancy loss,” Human Reproduction Open, vol.
2018(2), 2018.
[2] Ford HB, Schust DJ, “Recurrent pregnancy loss: etiology,
Fig. 9 Learning curves of different models (A) Support vector machine (SVM) diagnosis, and therapy,” Reviews in Obstetrics and
(B)Extreme Gradient Boosting (XGBoost) and (C) decision tree (DT) Gynecology, vol. 2(2), pp. 76, 2009.
[3] Shen Z, He Y, Shen Z, Wang X, Wang Y, Hua Z, Jiang N, Song
Among all models generated, SVM, DT, XGBoost, CNN and Z, Li R, Xiao Z, “Novel exploration of Raman microscopy and
ANN exhibited highest classification model accuracy on non-linear optical imaging in adenomyosis,” Frontiers in
Medicine, pp. 9, 2022.
Raman spectral data. These findings confirm the feasibility of
[4] Chen SJ, Zhang Y, Ye XP, Hu K, Zhu MF, Huang YY, Zhong
using Raman spectroscopy in combination with the ML M, Zhuang ZF, “Study of the molecular variation in pre-
algorithm for understanding disease pathophysiology of IRSM. eclampsia placenta based on micro-Raman spectroscopy,”
Archives of Gynecology and Obstetrics, vol. 290(5), pp. 943-
946, 2014.
[5] Guleken Z, Bulut H, Bulut B, Paja W, Parlinska-Wojtan M,
IV. CONCLUSION Depciuch J, “Correlation between endometriomas volume and
Raman spectra. Attempting to use Raman spectroscopy in the
diagnosis of endometrioma,” Spectrochimica Acta Part A:
We present a novel throughput strategy using Raman Molecular and Biomolecular Spectroscopy, vol. 274, pp.
121119, June 2022.
spectroscopy to understand the altered biochemical signatures
[6] Bendifallah S, Puchar A, Suisse S, Delbos L, Poilblanc M,
in endometrial tissue of IRSM women. Raman spectral analysis Descamps P, Golfier F, Touboul C, Dabi Y, Daraï E, “Machine
showed a significant decrease in the expression of proteins, learning algorithms as new screening approach for patients with
endometriosis,” Scientific Reports, vol. 12(1), pp. 1-2, January
lipids, glutamate and cholesterol in IRSM as compared with 2022.
controls. Additionally, we performed PCA, HCA combined [7] Lv W, Song Y, Fu R, Lin X, Su Y, Jin X, Yang H, Shan X, Du
with OPLS-DA for automated classification of data. We further W, Huang Q, Zhong H, “Deep learning algorithm for automated
applied eight ML algorithms including SVM, AdaBoost, detection of polycystic ovary syndrome using scleral images,”
Frontiers in Endocrinology, vol. 12, 2021.
XGBoost, DT, RF, GB, CNN and ANN and evaluated the [8] Schmidt LJ, Rieger O, Neznansky M, Hackelöer M, Dröge LA,
models using five different evaluation metrics, namely Henrich W, Higgins D, Verlohren S, “A machine-learning–
accuracy, sensitivity, specificity, F1-score, and ROC metrics. based algorithm improves prediction of preeclampsia-
associated adverse outcomes,” American Journal of Obstetrics
Grid search and Bayesian optimization was used for and Gynecology, February 2022.
hyperparameter optimization to obtain the best combination of [9] Blass I, Sahar T, Shraibman A, Ofer D, Rappoport N, Linial M,
parameters. We observed that SVM, DT, XGBoost, CNN and “Revisiting the risk factors for endometriosis: A machine
learning approach,” Journal of Personalized Medicine, vol.
ANN performed better than other models in term of accuracy. 12(7), pp. 1114, July 2022.
[10] Kodipalli A, Devi S, “Prediction of PCOS and Mental Health
This study has considerable clinical utility since the Using Fuzzy Inference and SVM,” Frontiers in Public Health,
endometrium i.e., the lining of the uterus where the embryo vol. 9, 2021.
implants in women with IRSM remains poorly understood. A [11] Goyal A, Kuchana M, Ayyagari KP, “Machine learning
significant downregulation of various biological predicts live-birth occurrence before in-vitro fertilization

Authorized licensed use limited to: SAINT LOUIS UNIVERSITY LIBRARIES. Downloaded on May 02,2024 at 08:19:13 UTC from IEEE Xplore. Restrictions apply.
treatment,” Scientific Reports, vol. 10(1), pp. 1-2, December datasets,” International Journal of Control Theory and
2020. Applications, vol. 9(40), 2016.
[12] Liu R, Bai S, Jiang X, Luo L, Tong X, Zheng S, Wang Y, Xu [20] Singer G, Marudi M, “Ordinal decision-tree-based ensemble
B, “Multifactor prediction of embryo transfer outcomes based approaches: The case of controlling the daily local growth rate
on a machine learning algorithm,” Frontiers in Endocrinology, of the COVID-19 epidemic,” Entropy, vol. 22(8), pp. 871,
vol. 12, 2021. August 2020.
[13] De Gelder J, De Gussem K, Vandenabeele P, Moens L, [21] Nguyen JM, Jézéquel P, Gillois P, Silva L, Ben Azzouz F,
“Reference database of Raman spectra of biological Lambert-Lacroix S, Juin P, Campone M, Gaultier A, Moreau-
molecules,” Journal of Raman Spectroscopy, vol. 38(9), pp. Gaudry A, Antonioli D, “Random forest of perfect trees:
1133-1147, September 2007. concept, performance, applications and perspectives,”
[14] Movasaghi Z, Rehman S, Rehman IU, “Raman spectroscopy of Bioinformatics, vol. 37(15), pp. 2165-2174, August 2021.
biological tissues,” Applied Spectroscopy Reviews, vol. 42(5), [22] Tabrizchi H, Tabrizchi M, Tabrizchi H, “Breast cancer
pp. 493-541, September 2007. diagnosis using a multi-verse optimizer-based gradient boosting
[15] Krafft C, “Raman spectroscopy and microscopy of cells and decision tree,” SN Applied Sciences, vol. 2(4), pp. 1-9, April
tissues,” Encyclopedia of Biophysics, vol. 2178, 2013. 2020.
[16] Sodo A, Verri M, Palermo A, Naciu AM, Sponziello M, [23] Lakhdari K, Saeed N, “A new vision of a simple 1D
Durante C, Di Gioacchino M, Paolucci A, di Masi A, Longo F, Convolutional Neural Networks (1D-CNN) with leaky-ReLU
Crucitti P, “Raman spectroscopy discloses altered molecular function for ECG abnormalities classification,” Intelligence-
profile in thyroid adenomas,” Diagnostics, vol. 11(1), pp. 43, Based Medicine, vol. 6, pp. 100080, January 2022.
December 2022. [24] Radzi SF, Karim MK, Saripan MI, Rahman MA, Isa IN, Ibahim
[17] William SN, Teukolsky SA, “What is a support vector machine, MJ, “Hyperparameter tuning and pipeline optimization via
Nat Biotechnol,” vol. 24, pp. 1565-1567, 2006. Grid search method and Tree-based AutoML in breast cancer
prediction,” Journal of Personalized Medicine, vol. 11(10), pp.
[18] Sevinç E, “An empowered AdaBoost algorithm 978, September 2021.
implementation: A COVID-19 dataset study,” Computers &
Industrial Engineering, vol. 165, pp. 107912, March 2022. [25] Giola C, Danti P, Magnani S, “Learning curves: A novel
approach for robustness improvement of load forecasting,”
[19] Ramraj S, Uzir N, Sunil R, Banerjee S, “Experimenting Engineering Proceedings, vol. 5(1), pp. 38, July 2021.
XGBoost algorithm for prediction and classification of different

Authorized licensed use limited to: SAINT LOUIS UNIVERSITY LIBRARIES. Downloaded on May 02,2024 at 08:19:13 UTC from IEEE Xplore. Restrictions apply.

You might also like