56 Shamsmashinelearning

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/343543259
A machine learning approach identified a diagnostic model for pancreatic

cancer through using circulating microRNA signatures
Article in Pancreatology · August 2020

DOI: 10.1016/j.pan.2020.07.399
CITATIONS READS
33 273
7 authors, including:
Behrouz alizadeh savareh Ali Behmanesh
38 PUBLICATIONS 180 CITATIONS
Iran University of Medical Sciences
32 PUBLICATIONS 233 CITATIONS
SEE PROFILE
SEE PROFILE
Azadeh Bashiri Amir Sadeghi

Shiraz University of Medical Sciences Shahid Beheshti University of Medical Sciences
71 PUBLICATIONS 567 CITATIONS 131 PUBLICATIONS 741 CITATIONS
SEE PROFILE SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Dietary patterns and the risk of colorectal cancer and adenoma: a case control study in Iran View project
Organizational Culture and the Organizational Commitment; Correlational Study in Hospital Staffs View project
All content following this page was uploaded by Amir Sadeghi on 16 October 2020.
The user has requested enhancement of the downloaded file.

Pancreatology 20 (2020) 1195e1204
Contents lists available at ScienceDirect
Pancreatology
journal homepage: www.elsevier.com/locate/pan
A machine learning approach identified a diagnostic model for

pancreatic cancer through using circulating microRNA signatures
Behrouz Alizadeh Savareh a, d, Hamid Asadzadeh Aghdaie b, Ali Behmanesh c, **,
Azadeh Bashiri d, Amir Sadeghi b, Mohammadreza Zali b, Roshanak Shams b, e, *
a
PhD in Medical Informatics, National Agency for Strategic Research in Medical Education, Tehran, Iran
b
Gastroenterology and Liver Diseases Research Center, Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical
Sciences, Tehran, Iran
c
Student Research Committee, School of Health Management and Information Sciences, Iran University of Medical Sciences, Tehran, Iran
d
Department of health information management, school of management and medical information sciences, Shiraz University of Medical Sciences, Shiraz,
Iran
e
Department of Medical Genetics, Shahid Beheshti University of Medical Sciences, Tehran, Iran
a r t i c l e i n f o a b s t r a c t
Article history: Late diagnosis of pancreatic cancer (PC) due to the limited effectiveness of modern testing approaches,
Received 21 January 2020 causes many patients to miss the chance of surgery and consequently leads to a high mortality rate.
Received in revised form Pivotal improvements in circulating microRNA expression levels in PC patients make it possible to di-
29 June 2020
agnose and treat patients at earlier stages. A list of circulating miRNAs was identified in this study using
Accepted 25 July 2020
Available online 9 August 2020
bioinformatics methods in association with pancreatic cancer through analyzing four GEO microarray
datasets. The value of top miRNAs was then assessed via using a machine learning method. Taking the
advantage of a combinatorial approach consisting of Particle Swarm Optimization (PSO) þ Artificial
Keywords:
Micro RNA
Neural Network (ANN) and Neighborhood Component Analysis (NCA) iterations on a collection of top
Circulating miRNA differentially expressed circulating miRNAs in PC patients, facilitated ranking them by significance.
Pancreatic cancer MiRNA’s functional analysis in the final index was performed by predicting target genes and constructing
Bioinformatics PPI networks. Remarkably, the final model consist of miR-663a, miR-1469, miR-92a-2-5p, miR-125b-1-3p
Early detection and miR-532e5p showed great diagnostic results on investigated cases and the validation set (Accuracy:
Machine learning 0.93, Sensitivity: 0.93, and Specificity: 0.92). Kaplan-Meier survival assessments of the top-ranked
miRNAs revealed that three miRNAs, hsa-miR-1469, hsa-miR-663a and hsa-miR-532e5p, had mean-
ingful associations with the prognosis of patients with pancreatic cancer. This miRNA index may serve as
a non-invasive and potential PC diagnostic model, although experimental testing is needed.
© 2020 IAP and EPC. Published by Elsevier B.V. All rights reserved.
Introduction challenging [2]. The average survival time following diagnosis is

often fewer than 6 months [3]. Late diagnosis of pancreatic
Pancreatic ductal adenocarcinoma (PDAC) is one of the most cancer (PC) due to the limited efficacy of conventional diagnostic
mortal malignancies and a big health problem worldwide [1]. methods can cause many patients to miss the chance of treat-
Resection surgery can raise the 5-year survival rate to 20%, but ment by surgery and, lead to a high mortality rate [4]. Numerous
only 10e20% of the patients find a chance for this treatment since molecular components such as CA19-9 and CA125 have been
early detection at a treatable stage of the disease is mainly introduced as potential markers for detection of pancreatic tu-
mors [5,6] however they are not sufficiently sensitive and specific
to significantly discriminate cancer from healthy or benign con-
ditions [7]. MicroRNAs (miRNAs) have been represented as
* Corresponding author. Gastroenterology and Liver Diseases Research Center,
Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti Uni-
promising biomarkers for diagnosis of PC in recent decades [8].
versity of Medical Sciences, Tehran, Iran. These are comprising a group of short non-coding RNA molecules
** Corresponding author. with 19e25 nucleotides that have been considered as hopeful
E-mail addresses: aa.behmanesh@gmail.com (A. Behmanesh), Shams.rosha.86@ biomarkers of early cancer diagnosis and precise prognosis [9].
gmail.com (R. Shams).
https://doi.org/10.1016/j.pan.2020.07.399
1424-3903/© 2020 IAP and EPC. Published by Elsevier B.V. All rights reserved.
1196 B. Alizadeh Savareh et al. / Pancreatology 20 (2020) 1195e1204
Some miRNAs play crucial roles in the progression of cancer and Machine learning
aberrant expression of those miRNAs may cause the development
of many types of tumors [10,11]. That means identifying proper In order to identify miRNA signatures that are the most prom-
miRNAs will be helpful in the diagnosis of the disease [12]. ising markers in the diagnosis of PDAC, a procedure was designed
Computational approaches, specifically machine learning, have using a combination of data mining techniques, illustrated in Fig. 1.
been used by many scientists for cancer diagnosis and classifi- Each of which are briefly described.
cation through using big datasets consist of miRNAs as bio-
markers [13e16]. In this study, we identified a number of Normalization
significantly differential serum miRNAs expressed in 671 micro- Normalization technique is applied for data preparation in
array PDAC expression profiles using bioinformatics techniques. machine learning only when features have different ranges.
The purpose of the study was to develop a diagnostic model for Normalization has a positive effect on the accuracy of the models.
PDAC using a combination of bioinformatics methods plus ma- This technique changes the values of variables to a common scale,
chine learning approaches -including Particle Swarm Optimiza- without losing the useful information. Min-Max Normalization is
tion (PSO), Artificial Neural Network (ANN) and Neighborhood one of the common methods which normalizes variables between
Component Analysis (NCA). Accordingly, it was critical to confine 0 and 1, as the following formula:
the large number of miRNAs associated with PDAC to detect the
most discriminative of miRNA features subsets pursuing fulfill X org Min
X norm ¼ (1)
distinct purposes. Then these miRNAs are accounted directly as Max Min
features on which feature selection methods can be applied to
remove remaining irrelevant and unnecessary ones. In machine
learning, feature selection is a technique of selecting a subset
from a given set of features based on certain criteria without Feature selection
alteration of the original features, which retains the interpreta- In this section, the most useful features in a dataset were
tion of the results. This hinders overfitting and boosts classifi- selected. The goal of features selection is decreasing training time,
cation performance specifically with gene expression data, which as well as increasing model interpretability, and generalization
usually has a high number of features. Therefore we focused on performance on the test set. To select the most useful features in a
those top significant miRNA signatures that had the lowest dataset, two separate paths of techniques were used: 1) combina-
adjusted p-values and highest Fold Change logarithm (log-FC) in tion of Particle Swarm Optimization (PSO) and Artificial Neural
the results of our bioinformatics analyses. We then performed Network (ANN) to select features, 2) applying Neighborhood
feature selection methods on this diminished group of miRNAs to Components Analysis (NCA) in feature selection from another
find out which ones are more critical for cancer diagnosis. perspective. Each of the mentioned methods described in more
Although some of those miRNAs have already been experimen- details. Moreover the reasons for choosing each of the methods will
tally proved to be associated with pancreatic cancer, through be stated.
machine learning technique assistance, we attempted to
demonstrate the association which can serve as a diagnostic Particle Swarm Optimization (PSO). As the number of features
model for PDAC. grows, choosing the more important ones with lowest classification
error requires exponential order of computations. Because of the
space problem of 28 dimensions (features representing gene
Material & methods expression), selecting the most optimal subset requires high cal-
culations. Therefore, an appropriate approach to selecting features
Microarray datasets selection and identification of differentially in such a situation can be determined by heuristic methods such as
expressed miRNAs PSO. The use of PSO due to the relatively high speed in optimization
as well as the stability in finding the optimal answers, is an
In order to find the most promising diagnostic miRNA bio- attributed approach and is fully compatible with the requirements
markers for PDAC, a total of four GEO profiles (GSE113486, of this study. Therefore, PSO technique was used to optimize this
GSE59856, GSE85589, and GSE106817) consisting of 671 selected study. In the PSO, the search can be carried out by the speed of the
miRNA expression profiles from serum samples of PDAC patients particle. During the development of several generations, only the
and healthy controls were downloaded and considered for differ- most optimist particle can transmit information onto the other
ential expression analysis. By the use of a visual analytics platform particles, hence the speed of the researching is very fast. Conse-
for comprehensive gene expression profiling & meta-analysis on- quently, the calculation in PSO is very simple. Compared with the
line tool (NetworkAnalytes, V3.0) [17], merging and analysis of the other developing calculations, it can be completed easily [18]. Every
data of all 671 PDAC patients and Differential expression analysis particle in the population has two vectors, i.e., velocity vector and
were done. NetworkAnalyst 3.0 is a powerful online data analytics position vector [19]. The PSO algorithm is recursive. It prompts
platform which is specialized in gene expression data tran- social search behavior among particles in the search space, where
scriptome profiling, network analysis and meta-analysis. Inte- every particle indicates one point in n-dimensional space as a so-
grating statistical significance (p values) with biological context lution for a given problem. In comparison with other EAs such as
(fold changes) from different sources of datasets, allows users to Genetic Algorithm (GA), the PSO has improved search efficacy with
easily identify the most promising gene candidates from differen- faster and more stable convergence rates [20]. The approach for
tial expression analysis results. All data were uploaded, normalized PSO used in this study is based on the standard PSO [21], but ac-
with Log 2 transformation and DE analysis for Cancer vs Control cording to the situation, the following changes were applied.
with P: 0.05 was performed. At the next step, the integrity of the Feature selection is encoded in the binary format, and in the
data was checked. For meta-analysis, Fisher’s method with P: 0.05 evolutionary iterations, the PSO tries to find a binary code that
cut off was applied to combine the p-values. Top miRNAs with minimizes the classification error in PC classification. Then, features
lowest p-values and logarithm of fold change (log-FC) ±2 were that have a stronger correlation with PC classification are identified
selected for further evaluations. through the PSO iterations. The PSO select more important features
B. Alizadeh Savareh et al. / Pancreatology 20 (2020) 1195e1204 1197
that are fed to ANN (for fitness evaluation). All of the above mentioned qualities justify using the NCA. This
algorithm straight maximizes a stochastic variant of the leave-one-
Artificial neural network (ANN). Due to the simulation of the human out k-nearest neighbors (KNN) score on the training set. Moreover,
brain’s decision-making process in the form of mathematical It can train a low-dimensional linear projection of data used for
models, the artificial neural network has been successful in data visualization and fast classification [26]. In NCA, the impor-
analyzing nonlinear relationships and is therefore superior to other tance of each feature is calculated based on its role in the prediction
machine learning techniques. of output [25]. However, since the real data distribution is un-
Based on the nonlinear and complex relationships between the known, the algorithm attempts to optimize the performance based
input variables (circulating microRNA signatures) and the predicted on training data. The algorithm is restricted to learning quadratic
result (pancreatic cancer), ANN was considered as the fitness distance metrics. It can always be represented by symmetric posi-
function in the PSO mechanism. Also ANN being fast, accurate and tive semi-definite matrices. It is denoted by a transformation ma-
reliable in the prediction or approximation, especially when nu- trix A, a metric is effectively learned as equation (2).
merical and mathematical methods fail. Moreover there is a sig-
nificant simplicity in using ANN due to its power to deal with dðx; yÞ ¼ ðx yÞ > Q ðx yÞ ¼ ðAx AyÞ > ðAx AyÞ (2)
multivariate and complicated problems(22). The goal is to reduce f(A), which is defined by equation (3).
An artificial neural network, a supervised learning algorithm,
XX X
includes an interconnected group of artificial neurons in the form of f ðAÞ ¼ pij ¼ pi (3)
intermittent layers. It processes information through a con- i j2Ci i
nectionist method to computation. At first, to calculate the error,
the ANN randomly makes predictions which are compared with the Since the NCA works based on random locations in the numeral
target outputs. ANN objective is to minimize the error. The ANN, space, its results may be varied in the features ranking. Therefore, in
adaptive system, changes its structure (weight among neurons) order to eliminate bias implied by random location start, the NCA
during a training phase to minimize the error [23]. Fig. 2 demon- process has been repeatedly performed with a high number of it-
strates the general structure of an ANN. Based on the generic erations (n ¼ 1000) and the mean of values reported as the result of
structure; ANN designed to predict Pancreatic Cancer was as fol- NCA.
lows: the input layer contains circulating microRNA signatures,
whereas a hidden layer has been used, also the output layer con- Validation
tains the PC classification. The hidden layer was constructed with The two feature selection methods have been individually
15 neurons (three times the number of inputs, according to the rule validated. Data samples in the first method (combination of PSO
of thumb). The activation function in the hidden layer was and ANN) at each run of the neural network are grouped and
considered as ‘tansig’, also ‘softmax’ selected for the output layer. stacked in a cross-validation scenario with k ¼ 5 in the training and
These activation functions are appropriate for pattern recognition testing of the model. In order to eliminate bias caused by the
ANN’s [24]. In order to implement PC predictor model in Matlab random start in the NCA problem space, feature selection was
2019, pattern recognition networks (pattern-net) are used for performed using a large number of rounds (iteration ¼ 1000). The
training and testing ANN. The pattern nets are feed-forward net- mean of the importance values was considered as the result of this
works that can be trained to classify observation cases according to method in term of importance of the features.
target.
Functional analysis of the miRNAs from final model
Neighborhood Component Analysis (NCA). In addition to PSO and
ANN combination, the other mechanism used to select important Kaplan Meier plotter
features was based on the application of the Neighborhood Kaplan Meier plotter is an online tool capable of assessing the
Component Analysis. Neighborhood Component Analysis (NCA) is impact of 54k genes and miRNAs on survival in 21 cancer types. The
a non-parametric method for selecting features with the goal of system incorporates gene chip and RNA-seq data - sources for the
maximizing prediction accuracy of regression and classification databases including GEO, EGA, and TCGA. The primary purpose of
algorithms. The reason to use NCA is that the nearest neighbor, the tool is a meta-analysis based discovery and validation of critical
NCA’s core, is a simple and efficient nonlinear decision rule and biomarkers [27]. For a better understanding of the associations of
often yields competitive results compared with the state-of-the- the considered miRNAs to the prognosis of PDAC, the KM plotter
art classification methods [25]. Nearest neighbor (KNN) is was used. In miRNA section, the pan-cancer option was selected,
extremely simple yet surprisingly effective method for classifica- and the search was restricted to Pancreatic ductal adenocarcinoma
tion. Its appeal stems from the facts that its decision surfaces are (n ¼ 178). The following criteria were used: Survival: OS, Auto
nonlinear, there is only a single integer parameter (which is easily select best cutoff: checked, Follow up threshold: 240 months,
tuned with cross-validation), and the expected quality of pre- censor at threshold: checked.
dictions improves automatically as the amount of training data
increases [26]. Target genes prediction
NCA has some advantages over other methods like PCA, LDA, etc. MiRDIP target gene prediction online tool v.4.1 was used in or-
With low dimensionality projection, the classes are consistently der to find the highest ranked probable target genes for each
much better separated by the NCA transformation than by either miRNA in the joint final model. By combining different target gene
PCA (which is unsupervised) or LDA (which has access to the class prediction algorithms, miRDIP is able to predict approximately 152
labels). Of course, the NCA transformation is still only a linear million human microRNA-targets, which are gathered from 30
projection, just optimized with a cost function which explicitly other softwares, and specifies an integrative score for any of those
encourages local separation. Furthermore, we can apply a nearest- predicted target genes [28]. The provided integrative confidence
neighbor classification in the projected space. Using the same score is considered for classifying the predicted target genes as
projection learned at training time, it projects the training set and “very high”, “high” and “medium” sections. In this study, the con-
all future test points and performs KNN in the low-dimensional fidence score 0.3 was set as the cut-off line in order to select the
space using the Euclidean measure [26]. predicted target genes for more functional evaluations.
Fig. 2. General form of an artificial neural network.
External validation
An independent cohort was developed as a validation set, con-
sisting of serum miRNA expression profiles from patients with
pancreatic cancer and healthy controls. The subjects were chosen
from two other GEO datasets (GSE112264 & GSE124158) consisting
of PC serum miRNA profiles and healthy controls (70 controls and
81 PC). The data of validation set samples were pre-processed then
injected into the trained model. Based on the evaluation of the
network in the prediction of the mentioned data, the confusion
matrix was analyzed.
Results
Differentially expressed miRNAs (DEMs)
A total of 1346, 1471, 127 and 93 miRNAs showed significant up/

down regulation in GSE106817, GSE113486, GSE85589 and
GSE59856 microarray datasets, respectively. After integration of
the results, 27 DEMs showed combined log-FC ±2 with lowest p-
values (Table 1).
Table 1
Top differentially expressed miRNAs through all analyzed datasets.
miRNA Combined Log-FC P-val
hsa-miR-125a-3p 0.457772533 7.68E-81

hsa-miR-8073 0.398337882 9.79E-80
hsa-miR-1238-5p 0.500430762 2.12E-78
hsa-miR-6893-5p 0.222837086 3.94E-60
hsa-miR-1290 0.251870064 1.36E-59
hsa-miR-4530 0.252766227 3.12E-52
hsa-miR-663a 0.217248906 3.12E-52
hsa-miR-1469 0.352761675 2.76E-47
hsa-miR-92a-2-5p 0.242443823 5.66E-46
hsa-miR-125b-1-3p 0.25596432 4.45E-45
hsa-miR-1236-5p 0.262190493 9.19E-43
hsa-miR-5100 0.247321172 2.51E-42
Fig. 1. Selection general procedure. hsa-miR-6075 0.257962134 6.24E-41
hsa-miR-6789-5p 0.282952299 1.34E-38
hsa-miR-7852 0.218510073 1.41E-38
hsa-miR-4536 0.285274858 4.34E-38
MiRNA-mRNA & PPI network construction hsa-miR-4490 0.308425262 5.13E-38
hsa-miR-3125 0.502933182 7.25E-38
STRING database v.9.0.5 was used to analyze the interactions hsa-miR-4476 0.242321912 5.93E-37
between the target genes of selected miRNAs [29]. Confident hsa-miR-575 0.226364357 7.39E-37
interaction score was set on 0.7. The protein-protein interaction hsa-miR-4736 0.209781341 2.42E-36
(PPI) networks were uploaded and visualized by Cytoscape hsa-miR-532-5p 0.235895661 4.16E-36
hsa-miR-3910 0.238832069 4.30E-36
software. Another network consisting of the interactions be-
hsa-miR-3927 0.230507438 6.91E-36
tween the selected miRNAs and their predicted target genes were hsa-miR-134-3p 0.215457081 2.06E-34
constructed and merged to the PPI network by the use of Cyto- hsa-miR-3128 0.201005266 2.82E-34
scape [30]. hsa-miR-4696 0.246605902 5.07E-34
Machine learning of PDAC based on selected features (miRNA signatures) were

analyzed. Fig. 3 c illustrates the result of the intersection of two
The machine learning process for PC consists of three stages in feature selection methods. The final model comprises of the most
which different data mining techniques are used. The results of promising potential miRNAs (miR-663a, miR-1469, miR-92a-2-5p,
these techniques are presented in this section. miR-125b-3p and miR-532e5p).
Feature selection PSO þ ANN

For selecting features to build the model, two methods were
employed. First, more important features in both methods were Fig. 3 a shows the selected features using PSO and ANN. The
selected and evaluated; then, the selected features were combined results show that combination of PSO and ANN selected 14 miRNAs
and analyzed. Finally, the performance of the ANN in the prediction (including miR-125a-3p, miR-663a, miR-1469, miR-92a-2-5p, miR-
Fig. 3. Combination of different feature selection methods (PSO þ ANN and NCA) a) PSO þ ANN Feature selection result, b) NCA values means over iterations, and c) final set of
feature selection, the intersection of two feature selection paths.
125b-1-3p, miR-5100, miR-7852, miR-4536, miR-4490, miR-3125, (difference between two successive training epoch errors) shows a
miR-532e5p, miR-3927, miR-3128 and miR-4696) as the most flattened manner, moreover the error value converged. This in-
powerful representative, to diagnosis of PDACs from healthy dicates that ANN didn’t experience over-fitting situation and the
controls. findings are reliable. On the other hand, in order to avoid biases in
the second path (NCA) as stated in the method section, rounds of
NCA NCA executions were done to achieve more accurate and reliable
results (number of iterations ¼ 1000).
Fig. 3 b illustrates results of NCA iterations in the form of mean
calculation for NCA feature importance values over 1000 iteration.
MiRNAs associated to overall survival of PDAC
The results show that 8 miRNA signatures such as miR-8073, miR-
6893, miR-4530, miR-663a, miR-1469, miR-92a-2-5p, miR-125b-
Survival analysis of the top-ranked miRNAs from the joint model
1e3 and miR-532e5p were selected by NCA.
was done using Kaplan-Meier plotter. 3 of 5 miRNAs with high ranks
in final model, hsa-mir-1469, has-mir-663a and has-mir-532 were
Cross-validation significantly associated with the prognosis of patients with PDAC.
In order to validate the results, and prevent any biases that These three miRNAs obtained P-values of 0.001, 0.001 and 0.009,
might be implied in, validation of data analysis results was per- respectively, and hazard ratios of 2.27, 2.27 and 0.59, respectively,
formed as below: between the high and low expression groups. Fig. 5A, B and C
The first path of feature selection (PSO þ ANN) was done in an represent the Kaplan-Meier survival curves for those three miRNAs.
iterative manner as the PSO routine which is an evolutionary
optimization technique. Furthermore, the performance of ANN in
classification of PDAC based on important miRNA signatures was Target genes of the joint model miRNAs
evaluated. Fig. 4 shows the error of ANN training, validation and
testing. As shown, after a few epochs (about 42) error gradient Overall 1312 target genes for three up-regulated miRNAs (miR-
532, miR-1469 and miR-663a) and 309 for the two down-regulated
miRNAs (miR-125b-1-3p and miR-92a-2-5p) were predicted.
PPI & miRNA-mRNA networks
The results of integrating constructed PPI network of predicted

target genes and the miRNA-mRNA interaction networks in Cyto-
scape for down-regulated (Fig. 6) and up-regulated (Fig. 7) miRNAs
separately, showed that miR-92a-2-5p and miR-125b-1-3p as
down-regulated miRNAs have two possible target genes in com-
mon. PEA15 and FOXP4 are targeted by both miRNAs in this
network. The interaction network of up-regulated miRNAs (miR-
1469, miR-663a and miR-532) demonstrated that these three
miRNAs target a module consisting of 45 predicted target genes.
External validation
Based on the evaluation of the network in the prediction of the

validation data, the confusion matrix was reported as follows: Ac-
Fig. 4. Error of ANN in PC classification task (train, validation and test). curacy: 0.93, Sensitivity: 0.93 and Specificity: 0.92. (Fig. 8).
Fig. 5. Survival curves for high ranked miRNAs.

Fig. 6. The interaction network of down-regulated high ranked miRNAs and their predicted target genes. Two predicted target genes (PEA15 and FOXP4) are in common between
two miRNAs.
Discussion This section reviews the result of this study from a variety of
perspectives:
PDAC is one of the most mortal cancers and a big health problem
worldwide. The main challenge for diagnosis and treatment of this Techniques used in feature selection
cancer is the late detection of the patients due to the late mani-
festation of symptoms and limited efficacy of current diagnostic Using a combination of Artificial Intelligence techniques, we
methods. Accordingly, a variety of diagnostic and prognostic bio- obtained a valid analysis on the ordering of features importance,
markers have been proposed, but most of them have failed to be and as the results of the study show, the Artificial Neural Network
applicable to clinical plans. In recent years, microRNA-based liquid as a simulation of human brain learning scenario with Particle
biopsy (miRNAs) has become a promising area of research for early Swarm Optimization as a simulation of the collective decision-
diagnosis of malignancies. Several studies have also found miRNAs making process in the nature lead to an appropriate way of
in plasma specimens of PC patients to be abnormally expressed, feature selection. On the other hand, features ranking based on the
indicating that circulating miRNAs could be useful for PC diagnosis use of the NCA iterations, a pure math-based approach, reinforces
[31]. For example, for diagnosis of PC different stages, Ganepola the findings of the first path and confirms them. Combining the
et al. utilized a panel comprised of three miRNAs (miR-22, miR- results of two paths leads to more reliable results mathematically,
642b-3p, and miR-885-5) in plasma and the AUC value was 0.97 as well as practically as follows. In fact, we used an unbiased data-
for discrimination of the PC cases [32]. In another study, Liu R et al. driven technique consisting of microarray miRNA transcriptomic
used a serum panel consisting of miR-20a, miR-21, miR-24, miR-25, data integrated with machine learning approaches to investigate
miR-99a, miR-185, and miR-191 for the detection of stage I-IV PCs, and confirm potential biomarkers together as diagnostic models for
and the AUC value was 0.99 [33]. What should be considered is that PDAC detection.
top biomarkers, even those acquired by highly advanced methods,
have not reached a better diagnostic and prognostic performance
Advantages and disadvantages of the methods used
than current biomarkers in large-scale clinical trials [34]. Through
recent decades, advances in the field of computational biology in
First, the types of feature selection methods (filter based,
cancer omics-based research (such as machine learning methods)
embedded and wrapper) were described: The filter methods pick
have enabled researchers to evaluate more powerful diagnostic
up the intrinsic properties of the features (i.e., the “relevance” of the
biomarkers and models for PC [35]. In this study by analyzing on-
features) measured via univariate statistics instead of cross-
line microarray miRNA expression profiles from GEO (Gene
validation performance. Wrapper methods (such as PSO þ ANN)
Expression Omnibus) database, we have determined a set of
measure the “usefulness” of features based on the classifier per-
circulating miRNAs of patients with pancreatic cancer, as a poten-
formance. Embedded methods, are quite similar to wrapper
tial diagnosis model.
methods since they are also used to optimize the objective function
Fig. 7. The interaction network of up-regulated high ranked miRNAs. The genes with green color represent genes with tumor-suppressive functions.
or performance of a learning algorithm or model (such as ANN). amongst the final model (hsa-mir-1469, has-mir-663a and has-mir-
Wrapper methods are essentially solving the “real” problem 532) are comprehensively associated with the overall survival of
(optimizing the classifier performance), but they are also compu- the PDAC patients based on their up or down-regulated expression
tationally more expensive in comparison to the filter methods due patterns. Next, for more evaluations of the functions and roles of
to the repeated learning steps and cross-validation. Given the high considered miRNAs, we performed target genes prediction and
processing power of today’s hardware, the processing burden functional enrichment analysis as well as PPI network construction
imposed on this method is bearable and is not problematic in non- for those target genes. Finally, two miRNA-target genes interaction
super-sized data (such as ours). networks were constructed for up and down regulated miRNAs
separately. The results are further discussed as follows:
Biological analysis of results Mir-1469, has shown aberrant expression in a variety of cancers
including pancreatic cancer [36e38]. Despite most of the studies
At the first step, a set of 27 miRNA signatures that were captured have reported down-regulation of miR-1469 in other cancers, in
through differential expression analysis as the most significant this study we have found a significant up-regulation of this gene for
markers for discriminating PDAC from healthy controls, were used PDAC. The result of Overall Survival analysis also showed that the
as the input data for two independent feature selection algorithms patients with high expression of miR-1469 have a significantly
(consisting of PSO þ ANN & NCA). We used two different methods poorer prognosis in comparison to the healthy group (Fig. 5A).
in order to increase the stringency of the work to find the most Similarly, MiR-663a which was amongst the up-regulated miRNAs
promising model throughout all the input features. Combining the in our final model has yielded a controversy about its expression
results of those algorithms, we reached a final model consisting of 5 levels in different types of cancers including PC. Some studies re-
miRNAs (miR-663a, miR-1469, miR-92a-2-5p, miR-125b-3p and ported down-regulation of this miRNA in serum samples of Gastric,
miR-532e5p) with the greatest combinatorial AUC score in differ- Colorectal, Hepatocellular and non-small cell lung cancers [39,40];
entiation PC from controls. For validation, we checked the impor- but some others reported up-regulation of these gene and
tance and association of the selected miRNAs from the final model appointed an oncogenic function for this gene in a group of cancers
to the overall survival rates of PC patients using another online [41,42]. The results of our analyses of data from 671 PC serum
database named KM Plotter. The results showed that 3 miRNAs samples and healthy controls showed that the level of this miRNA is
Fig. 8. The performance of the final model in diagnosis of PC patients from healthy controls in the validation set.
significantly up-regulated in PC in comparison to the control group. biomarkers that were handed by a comprehensive bioinformatics
This pattern is meaningfully in line with the Overall Survival data mining approach. Our results established an index including
analysis results that we performed for this gene, where it was clinically-chosen miRNAs as biomarkers for pancreatic cancer
demonstrated that high levels of miR-663a are associated with diagnosis and demonstrated that by ranking miRNAs via feature
poorer prognosis of PDAC patients. The same situation is true for selection methods, the top discriminating miRNAs for pancreatic
miR-532 and this gene shows diversity in expression in different cancer detection among those bioinformatically verified can be
types of cancers [43e45]. Regarding PC, there is no report of de- acquired. Nonetheless, these findings have to be clinically
regulation of this gene in literature but the results of our evalua- confirmed.
tions on expression data of serum of PC patients represent a distinct
up-regulation of this gene. However, the KM plotting showed that Author contribution
PDAC patients with decreased expression of this gene have a
shorter life span and poorer survival. For all these 3 up-regulated RS designed the study, did the bioinformatics analyses and
miRNAs of our final model, we performed target genes prediction wrote the manuscript. BA performed the Machine learning analyses
and miRNA-miRNA interaction analysis. The results showed that and helped writing manuscript. AB and AB helped improving il-
these 3 miRNAs may regulate a module consisting of 45 target lustrations and data collection. HA, AS and MZ were the clinical
genes. Functional evaluations revealed that some critical tumor counselors expert in pancreatic cancer and clarified the main goals
suppressors such as CDKN2A and RHOB exist amongst the target of the project and interpreted the data. SS edited the manuscript in
genes of those miRNAs.These findings suggest that the considered English writing and scientific aspects.
miRNAs may get associated with PC tumorigenesis via inhibiting
those tumor suppressor genes, however, the existence of such as- Declaration of competing interest
sociations needs further experimental validations. On the other
hand, an interaction network was also constructed for down- The authors declare they have no conflict of interests.
regulated miRNAs and their target genes. MiR-92a-2-5p and miR-
125b-1-3p demonstrated to have 2 target genes in common, Acknowledgments
PEA15 and FOXP4. For PEA15 gene, which is a well-known ERK
signaling regulator with anti-proliferation effects [39], there does We thank our colleague, Dr. Golnaz Bahramali, Department of
not seem to have any associations with the down-regulation of Hepatitis and AIDS, Pasteur Institute of Iran, who provided insight
miR-92a-2. But the other gene, FOXP4 has been reported as a and expertise that greatly assisted the research, although she may
modulator of tumorigenesis by several studies [40,41]. However, to not agree with all of the interpretations of this paper.
infer the correlations among any of the discussed miRNAs and the
target genes with PC, experimental evaluations are needed. Hence,
References
although some additional biomarkers as a diagnostic model were
identified in this bioinformatics and machine learning study for [1] Siegel R, Ma J, Zou Z, Jemal A. Colorectal cancer statistics, 2014. CA A Cancer J
potential consideration in future research, the outputs of these Clin 2014;64(1):9e29.
analyses do not support the immediate clinical use of these bio- [2] Von Hoff DD, Ervin T, Arena FP, Chiorean EG, Infante J, Moore M, et al.
Increased survival in pancreatic cancer with nab-paclitaxel plus gemcitabine.
markers without further comprehensive testing in broad case- N Engl J Med 2013;369(18):1691e703.
control and cohort studies. [3] Gillen S, Schuster T, Zum Büschenfelde CM, Friess H, Kleeff J. Preoperative/
neoadjuvant therapy in pancreatic cancer: a systematic review and meta-
analysis of response and resection percentages. PLoS Med 2010;7(4):
Conclusion e1000267.
[4] Ducreux M, Caramella C, Hollebecque A, Burtin P, Goe re D, Seufferlein T, et al.
Cancer of the pancreas: ESMO Clinical Practice Guidelines for diagnosis,
This study aimed to find a robust diagnostic model for pancre- treatment and follow-up. Ann Oncol 2015;26(suppl_5):v56e68.
atic cancer through machine learning methods, using miRNA [5] Jiang X, Tao H, Zou S. Detection of serum tumor markers in the diagnosis and
treatment of patients with pancreatic cancer. Hepatobiliary pancreatic dis- [26] Neighbourhood components analysis. In: Goldberger J, Hinton GE, Roweis ST,
eases international. HBPD INT 2004;3(3):464e8. Salakhutdinov RR, editors. Advances in neural information processing sys-
[6] Ballehaninna UK, Chamberlain RS. The clinical utility of serum CA 19-9 in the tems; 2005.
diagnosis, prognosis and management of pancreatic adenocarcinoma: an ev- [27] Nagy A, Lanczky A, Menyha rt O, Gyo
} rffy B. Validation of miRNA prognostic
idence based appraisal. J Gastrointest Oncol 2012;3(2):105. power in hepatocellular carcinoma using expression data of independent
[7] Chan A, Prassas I, Dimitromanolakis A, Brand RE, Serra S, Diamandis EP, et al. datasets. Sci Rep 2018;8(1):9227.
Validation of biomarkers that complement CA19. 9 in detecting early [28] Tokar T, Pastrello C, Rossos AE, Abovsky M, Hauschild A-C, Tsay M, et al.
pancreatic cancer. Clin Canc Res 2014;20(22):5787e95. mirDIP 4.1dintegrative database of human microRNA target predictions.
[8] Du Y, Liu M, Gao J, Li Z, Radiopharmaceuticals. Aberrant microRNAs expression Nucleic Acids Res 2017;46(D1):D360e70.
patterns in pancreatic cancer and their clinical translation. Canc Biother [29] Szklarczyk D, Morris JH, Cook H, Kuhn M, Wyder S, Simonovic M, et al. The
2013;28(5):361e9. STRING database in 2017: quality-controlled proteineprotein association
[9] Costello E, Greenhalf W, Neoptolemos JP. New biomarkers and targets in networks, made broadly accessible. Nucleic Acids Res 2016:gkw937.
pancreatic cancer and their application to treatment. Nat Rev Gastroenterol [30] Smoot ME, Ono K, Ruscheinski J, Wang P-L, Ideker T. Cytoscape 2.8: new
Hepatol 2012;9(8):435. features for data integration and network visualization. Bioinformatics
[10] Jansson MD, Lund AH. MicroRNA and cancer. Molecular oncology 2012;6(6): 2010;27(3):431e2.
590e610. [31] Duell EJ, Lujan-Barroso L, Sala N, Deitz McElyea S, Overvad K, Tjonneland A,
[11] Shamsi R, Seifi-Alan M, Behmanesh A, Omrani M, Mirfakhraie R, Ghafouri- et al. Plasma microRNAs as biomarkers of pancreatic cancer risk in a pro-
Fard S. A bioinformatics approach for identification of miR-100 targets spective cohort study. Clinical Chemistry 2017;141(5):905e15.
implicated in breast cancer. Cell Mol Biol 2017;63(10). [32] Ganepola GA, Rutledge JR, Suman P, Yiengpruksawan A, Chang DHJWjogo.
[12] van Schooneveld E, Wildiers H, Vergote I, Vermeulen PB, Dirix LY, Van Novel blood-based microRNA biomarker panel for early diagnosis of pancre-
Laere SJ. Dysregulation of microRNAs in breast cancer and their potential role atic cancer, vol. 6; 2014. p. 22. 1.
as prognostic and predictive biomarkers in patient management. Breast Canc [33] Liu R, Chen X, Du Y, Yao W, Shen L, Wang C, et al. Serum microRNA expression
Res 2015;17(1):21. profile as a biomarker in the diagnosis and prognosis of pancreatic cancercc.
[13] Lu J, Getz G, Miska EA, Alvarez-Saavedra E, Lamb J, Peck D, et al. MicroRNA Clinical Chemistry 2012;58(3):610e8.
expression profiles classify human cancers. Nat Protoc 2005;435(7043):834. [34] Diamandis EP. The failure of protein cancer biomarkers to reach the clinic:
[14] Kotlarchyk A, Khoshgoftaar T, Pavlovic M, Zhuang H, Pandya AS. Identification why, and what can be done to address the problem? BMC Med 2012;10(1):87.
of microRNA biomarkers for cancer by combining multiple feature selection [35] Ding L, Wendl MC, McMichael JF, Raphael BJ. Expanding the computational
techniques. J Comput Methods Sci Eng 2011;11(5e6):283e98. toolbox for mining cancer genomes. Nat Rev Genet 2014;15(8):556e70.
[15] Waspada I, Wibowo A, Meraz NS. Supervised machine learning model for [36] Lin M-S, Chen W-C, Huang J-X, Gao H-J, Sheng H-H. Aberrant expression of
microrna expression data in cancer. Jurnal Ilmu Komputer dan Informasi microRNAs in serum may identify individuals with pancreatic cancer. Int J Clin
2017;10(2):108e15. Exp Med 2014;7(12):5226.
[16] Muhamed Ali A, Zhuang H, Ibrahim A, Rehman O, Huang M, Wu A. A machine [37] Xu C, Zhang L, Li H, Liu Z, Duan L, Lu C. MiRNA-1469 promotes lung cancer
learning approach for the classification of kidney cancer subtypes using cells apoptosis through targeting STAT5a. American journal of cancer research
miRNA genome data. Appl Sci 2018;8(12):2422. 2015;5(3):1180.
[17] Xia J, Gill EE, Hancock RE. NetworkAnalyst for statistical, visual and network- [38] Zhang Y, Fang J, Zhao H, Yu Y, Cao X, Zhang B. Downregulation of microRNA-
based meta-analysis of gene expression data. Nat Protoc 2015;10(6):823. 1469 promotes the development of breast cancer via targeting HOXA1 and
[18] Rini DP, Shamsuddin SM, Yuhaniz SS. Particle swarm optimization: technique, activating PTEN/PI3K/AKT and Wnt/b-catenin pathways. J Cell Biochem
system and challenges. International journal of computer applications 2019;120(4):5097e107.
2011;14(1):19e26. [39] Huang W, Li J, Guo X, Zhao Y, Yuan X. miR-663a inhibits hepatocellular car-
[19] Genetic cnn. In: Xie L, Yuille A, editors. Proceedings of the IEEE international cinoma cell proliferation and invasion by targeting HMGA2. Biomed Phar-
conference on computer vision; 2017. macother 2016;81:431e8.
[20] Zhang L, Tang Y, Hua C, Guan X. A new particle swarm optimization algorithm [40] Zhang Y, Xu X, Zhang M, Wang X, Bai X, Li H, et al. MicroRNA-663a is
with adaptive inertia weight based on Bayesian techniques. Appl Soft Comput downregulated in non-small cell lung cancer and inhibits proliferation and
2015;28:138e49. invasion by targeting JunD. BMC Canc 2016;16(1):315.
[21] Boeringer DW, Werner DH. Particle swarm optimization versus genetic al- [41] Ma Q, Zhang Y, Liang H, Zhang F, Liu F, Chen S, et al. EMP3, which is regulated
gorithms for phased array synthesis. IEEE Trans Antenn Propag 2004;52(3): by miR-663a, suppresses gallbladder cancer progression via interference with
771e9. the MAPK/ERK pathway. Canc Lett 2018;430:97e108.
[22] Kiani MKD, Ghobadian B, Tavakoli T, Nikbakht A, Najafi G. Application of [42] Jiao L, Deng Z, Xu C, Yu Y, Li Y, Yang C, et al. MiR-663 induces castration-
artificial neural networks for the prediction of performance and exhaust resistant prostate cancer transformation and predicts clinical recurrence. J Cell
emissions in SI engine using ethanol-gasoline blends. Energy 2010;35(1): Physiol 2014;229(7):834e44.
65e9. [43] Hu S, Zheng Q, Wu H, Wang C, Liu T, Zhou W. miR-532 promoted gastric
[23] Mohapatra S, Ganesh K, Punniyamoorthy M, Susmitha R. Developing a clas- cancer migration and invasion by targeting NKD1. Life Sci 2017;177:15e9.
sification model using ANN. In: Service quality in Indian hospitals: perspec- [44] Xu X, Zhang Y, Liu Z, Zhang X, Jia J. Mi RNA-532-5p functions as an oncogenic
tives from an emerging market. Cham: Springer International Publishing; micro RNA in human gastric cancer by directly targeting RUNX 3. J Cell Mol
2018. p. 53e61. Med 2016;20(1):95e103.
[24] Sharma S. Activation functions in neural networks. Data Sci 2017;6. [45] Wang F, Chang JT-H, Kao CJ, Huang RS. High expression of miR-532-5p, a
[25] Yang W, Wang K, Zuo W. Neighborhood component feature selection for tumor suppressor, leads to better prognosis in ovarian cancer both in vivo and
high-dimensional data. J Clin Psychol 2012;7(1):161e8. in vitro. Mol Canc Therapeut 2016;15(5):1123e31.
View publication stats

56 Shamsmashinelearning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

56 Shamsmashinelearning

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

A machine learning approach identiﬁed a diagnostic model for pancreatic

Article in Pancreatology · August 2020

Behrouz alizadeh savareh Ali Behmanesh

Azadeh Bashiri Amir Sadeghi

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

Contents lists available at ScienceDirect

A machine learning approach identiﬁed a diagnostic model for

Introduction challenging [2]. The average survival time following diagnosis is

Fig. 2. General form of an artiﬁcial neural network.

Differentially expressed miRNAs (DEMs)

A total of 1346, 1471, 127 and 93 miRNAs showed signiﬁcant up/

miRNA Combined Log-FC P-val

hsa-miR-125a-3p 0.457772533 7.68E-81

Machine learning of PDAC based on selected features (miRNA signatures) were

Feature selection PSO þ ANN

PPI & miRNA-mRNA networks

The results of integrating constructed PPI network of predicted

Based on the evaluation of the network in the prediction of the

Fig. 5. Survival curves for high ranked miRNAs.

View publication stats

You might also like