Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 24

Chapter Two


2.1 Introduction

A literature review, commonly known as a review of related works, is an

organized method of gathering data about a certain topic or study area.We
will employ data mining techniques as a science in this investigation.

Applications for data mining are becoming increasingly commonplace in a

variety of industries, including public health, e-business, marketing, and
retail. Due to this, its application in knowledge discovery and the
identification of intriguing patterns from massive amounts of data has
grown in prominence.[44] On a daily basis, enormous volumes of medical
data are produced by independent researchers, clinical settings, community
monitoring programs, and reports from various medical settings, such as
test results and patient records. These data are available in hundreds of
public and private databases. Thus, researchers and practitioners are
now facing the problem of data overload. These data need to be
effectively organized and analyzed in order to extract useful knowledge
for sound decision making [44]. To handle these massive data sources and
extract complicated relationships from the data that are challenging for
traditional statistical tools to evaluate, new computational techniques are
required. This is the main reason why data mining as a new
information technology technique is emerged [44]. In this study different
data mining algorithms and their working paradigms are considered and
explained in detail. data mining algorithms studied here are
classification algorithms (Naive Bayes, Bayes network, J48, random forest,
multilayer perceptron, and logistic regression).Each object in the dataset is
classified according to its similarities. The most popular and often applied
DM technique is classification. The classification method's objective is to
correctly forecast the target class of objects for which the class label is
unknown. (41).

The term ‘jaundice’ is used to describe the yellow-orange discoloration of

the skin and sclera because of excessive bilirubin in the skin and mucous
membranes[1]. Jaundice is caused by the presence of unconjugated bilirubin
(lipidsoluble) which is insoluble in water making it difficult to be
Neonates have immature conjugation capacity and livers, especially immature
neonates; therefore, they cannot excrete the bilirubin at the same rate at
which it is produced leading to an increase in bilirubin levels in the
neonate’s blood[3].
Most neonates develop jaundice within the first week of life and in some
cases, it is mild and harmless; however, a rapid increase in bilirubin (>20
mg/dL) can reach toxic levels[4]. This results in complications such as
bilirubin encephalopathy and kernicterus with a significant risk of neonatal
mortality and long-term neurological damage and complications such as
sensor neural hearing loss, cerebral palsy, intellectual difficulties, upward
gaze palsy, seizure, gross dental dysplasia, and neurodevelopment delay later
in life [5].
Neonatal jaundice is common a clinical problem worldwide. It is estimated
that it affects at least 481,000 term or near-term newborns annually, causing
114,000 deaths and more than 63,000 cases of moderate or severe
disability[6]. Globally it's estimated 6 out of every 10 babies develop
jaundice, every year and about 1.1 million babies develop severe
hyperbilirubinemia with or without bilirubin encephalopathy and the vast
majority reside in sub-Saharan Africa and South Asia.
[7, 8]. In Sub-Saharan Africa neonatal jaundice is the 8th leading cause of
neonatal mortality and re hospitalization[9].
It is pointed out that there is a need to improve the diagnosis of neonatal
jaundice to prevent severe hyperbilirubinemia and kernicterus. Hence, it is
important to explore new methodologies, such as data mining, that can
provide better results than the traditional methods.
Early therapy of neonatal jaundice depends on accurately identifying infants
at risk of severe hyperbilirubinemia and kernicterus; however babies with
dark skin, it’s hard to tell either they are jaundice or not, so we gently
press the skin on baby's nose or forehead. If it's jaundice, the skin will
appear yellow when you lift your finger[10].
Jaundice is the most prevalent clinical symptom[11, 12].It is accountable for
70% and 10% of neonatal morbidity and mortality, respectively,
worldwide[13]. Severe neonatal jaundice accounted for 2.8% of neonatal
deaths in the UK, 30.8% in India, 34% in Nigeria, 14% in Kenya and
6.7% in Egypt[8].Of the total neonatal mortality, about 75% occurred in
South Asia and sub-Saharan Africa[5, 14].Study conducted in Nigeria showed
35.0% of NICU admission is because neonatal jaundice[15].It is also one of
the most common causes of neonatal mortality and responsive for 6.7% of
neonatal death in Ghana[16].

2.3 Jaundice in Ethiopia

Ethiopia is one of the top 10 countries in jaundice-related neonatal

mortality[17]. Globally, 2.6 million newborns died in 2016, out of this half
of all these deaths occurred in India, Pakistan, Nigeria, the Democratic
Republic of Congo and Ethiopia[14].More than the 34.5% of newborn death
occurs within the first 28 days after birth and hyperbiliru binemia was
among the causes of neonatal admission and death[18].Institutional based
study conducted revealed that Jaundice was one of the significant predictors
for neonatal mortality, especially under 7 days of age[19].The Ethiopian
Federal Minister of Health (FMOH) has implemented different strategies to
improve maternal and neonatal health, including expanding healthcare
facilities, increasing skilled health professionals, and increasing the
availability of supplies. Even though those actions are taken, the neonatal
mortality rate is still 29/1000 live births[20].
Risk factors

Predictors of neonatal jaundice were classified as neonatal, obstetric,

maternal, and medical factors. Among neonatal and obstetric factors: preterm
birth (< 37 weeks of gestational age)[21], male sex, low birth weight (LBW;
< 2,500 g) or small for gestational age (SGA), prolonged labor, and
primiparity [22],normal and oxytocin assisted delivery[23],low Appearance
Pulse Grimace response Activity Respiration (APGAR) score, and birth
asphyxia[24],duration of labor[25],multiple pregnancies[26], and vacuum
extraction[27] were significantly associated with neonatal jaundice.

Maternal and medical predictors that had a significant association with

neonatal jaundice were as follows: drug use during pregnancy[28], maternal
age, body mass index, and hemoglobin level[29],thyroid-stimulating
hormone[30].In addition to the above ABO and Rh incompatibility, sepsis,
and total serum bilirubin level[31], maternal smoking status[27],low maternal
educational status and parity[32], and maternal O blood group[33] were the
significant predictors of neonatal jaundice. Postnatal factors include birth
trauma, infections, inadequate breastfeeding and dehydration in the first few
days of life, and exclusive breastfeeding (late-onset hyperbilirubinemia[34-

Even though neonatal jaundice is not totally preventable, with early

detection and treatment it is possible to avert irreversible Complications.The
correct identification of newborns at risk of developing severe hyperbilirubinemia
and kernicterus is essential for early treatment. Therefore, preventing the
newborn from toxic bilirubin levels, especially for their immature central
nervous system, has become a mainconcern for pediatricians.
2.4 Data mining technology

Data mining derives its name from the similarities between searching for
valuable information in a large database and mining a mountain for a vein of
valuable one, Both processes require either sifting through an immense amount
of material, or intelligently probing it to find where the value resides[37, 38].It is
the computer-assisted process of digging through and analyzing enormous
sets of data and then extracting the meaning of the data. Data mining tools
predict behaviors and future trends, allowing businesses to make proactive,
knowledge-driven decisions. Data mining tools can answer business
questions that were traditionally too time consuming to resolve. They scour
databases for hidden patterns, finding predictive information that experts
may miss because it lies outside their expectations[39].

The need to understand large, complex, information-rich data sets is common to

virtually all fields of business, science, and engineering. In the business world,
corporate and customer data are becoming recognized as a strategic asset.
The ability to extract useful knowledge hidden in these data and to act on that
knowledge is becoming increasingly important in today’s competitive world. The
entire process of applying a computer-based methodology, including new
techniques, for discovering knowledge from data is called data mining.

Data mining is an iterative process within which progress is defined by discovery,

through either automatic or manual methods. Data mining is most useful in an
exploratory analysis scenario in which there are no predetermined notions
about what will constitute an “interesting” outcome. Data mining is the search
for new, valuable, and nontrivial information in large volumes of data. In
practice, the two primary goals of data mining tend to be prediction and
description. Prediction involves using some variables or fields in the data set
to predict unknown or future values of other variables of interest (On the
predictive end of the spectrum, the goal of data mining is to produce model,
expressed as an executable code, which can be used to perform classification),
Description, on the other hand, focuses on finding patterns describing the data
that can be interpreted by humans.
Data mining has been defined in almost as many ways as there are authors
w h o have written about it. According to Berry and Linoff (28), data mining is
the process of exploration and analysis by automatic or semiautomatic means,
of large quantities of data in order to discover meaningful patterns and rules.
Data mining usually makes sense when there is large amount of data. For this
reason most of the algorithms developed for data mining purpose requires
large volume of data so as to build and train models that are
responsible for different tasks of data mining such as classification, clustering,
prediction, and association. The need for bulky data can be explained by a
couple of reasons. Primarily, in the case of small databases, it is feasible to
capture appealing trends and relationships by introducing traditional tools
such as spreadsheets and database query. The second reason is that
most data mining tools and demand large amount of training data (data
used for building a model) in order to generate unbiased models. The rationale
is simple and straightforward, small training data results in unreliable
generalizations based on chance patterns.

According to Witten and Frank data mining is valuable to discover implicit,

potentially useful information from huge data stored in databases via
building computer programs that sift through databases automatically or
semi-automatically, seeking meaningful patterns[40]. The opportunity for the
application of data mining has increased significantly as databases grew
extremely and new machine with searching capabilities evolved.

Common Tasks of Data Mining

Data mining tasks can be conveniently classified into numerous categories, each of
which corresponds to a particular goal for the data analyst. The following is how
data mining tasks are categorized in the book Principles of Data Mining[41].

Exploratory Data Analysis (EDA) As the name implies, the objective here is to
merely examine the data without having a specific notion of what we are trying to
find. EDA approaches are typically interactive and visual, and for small, low-
dimensional data sets, there are numerous efficient graphical presentation ways.

Descriptive Modeling: The goal of a descriptive model is to describe all of the

data (or the process g e n e r a t i n g t h e data). Examples of such descriptions
include models for the overall probability distribution of the data (density
estimation), partitioning of the p-dimensional space into groups (cluster
analysis and segmentation), and models describing the relationship between
variables (dependency modeling). In segmentation analysis, for example, the
aim is to group together similar records, as in market segmentation of
commercial databases. Here the goal is to split the records into
homogeneous groups so that similar people (if the records refer to people) are
put into the same group. This enables advertisers and marketers to
efficiently direct their promotions to those most likely to respond. The number
of groups here is chosen by the researcher; there is no "right" number.

Predictive Modeling: Classification and Regression: The aim here is to build

a model that will permit the value of one variable to be predicted from the
known values of other variables. In classification, the variable being predicted
is categorical, while in regression the variable is quantitative. The term
"prediction" is used here in a general sense, and no notion of a time continuum
is implied. So, for example, while we might want to predict the value of
the stock market at some future date, or which horse will win a race, we might
also want to determine the diagnosis of a patient, or the degree of brittleness
of a weld.

Discovering Patterns and Rules: The three types of tasks listed above are
concerned with model building. The discovery of patterns is the focus of other
data mining applications.
One example is spotting fraudulent behavior by detecting regions of the space
defining the different types of transactions where the data points significantly
different from the rest. Another use is in astronomy, where detection of unusual
stars or galaxies may lead to the discovery of previously unknown phenomena.
Yet another is the task of finding combinations of items that occur frequently
in transaction databases (e.g. grocery products that are often purchased

Data mining experts have given this issue a lot of attention, and algorithmic
methods based on association rules have been used to solve it.

Retrieval by Content: Here the user has a pattern of interest and wishes to find
similar patterns in the data set. This task is most commonly used for text and
image data sets. For text, the pattern may be a set of keywords, and the user
may wish to find relevant documents within a large set of possibly relevant
documents (e.g., Web pages). For images, the user may have a sample image, a
sketch of an image, or a description of an image, and wish to find similar
images from a large set of images. In both cases the definition of similarity
is critical, but so are the details of the search strategy.

2.5 Process model of Data Mining

One of the greatest strengths of data mining is reflected in its wide range of
methodologies and techniques that can be applied to a host of problem sets[37].
Data mining tools perform data analysis and uncover important data patterns,
contributing greatly to different business strategies including medical researchers.
The widening gap between data and information calls for a systematic
development of data mining tools that will turn data tombs into golden nuggets
of knowledge. Thus, patterns and knowledge from data mining is using for
sound judgment and proactive decision making in different organization including
health care sectors.

Broadly used methodologies in data mining are KDD (Knowledge Discovery

in Data base), CRISP-DM (Cross-Industry Standard Process for Data Mining),
SEMMA (Sample Explore Modify Model Assess), and HYBRID process[42].

2.5.1 Knowledge Discovery in Database (KDD)

Many people treat data mining as a synonym for another popularly used term,
knowledge discovery from data, or KDD, while others view data mining as merely an
essential step in the process of knowledge discovery. The knowledge discovery process is
as an iterative sequence of the following steps: [43]

1. Data cleaning (to remove noise and inconsistent data)

2. Data integration (where multiple data sources may be combined)

3 Data selection (where data relevant to the analysis task are retrieved from the

4 Data transformation (in which summary or aggregate processes are used to

modify and combine data into formats suitable for mining)

5 Data mining (a crucial procedure that uses clever techniques to extract data
6 Pattern evaluations (to identify the truly interesting patterns representing
knowledge based on interestingness measure)

7 Knowledge presentations (where users are presented with mined knowledge

through the use of visualization and knowledge representation techniques)

2.5.2 CRISP-DM

The Cross-Industry Standard Process for Data Mining is an industry framework

that is frequently referenced. (CRISP-DM) model. CRISP-DM offers a general
model for data/text mining projects, highlighting the key tasks involved.
According to the CRISP-DM framework, the life cycle of a knowledge discovery
project consists of six phases, but the sequence of the phases is not strictly
applied. Moving back and forth between different phases is always required.
The process is iterative because the choice of subsequent phases often
depends on the outcome of preceding phases. The life cycle begins with
business understanding to ground the overall aims of the project, and then
moves to data understanding to identify potential inputs and outputs, data
quality issues, and potential privacy or security concerns. The third phase,
data preparation, involves the extraction of relevant data for a particular
modeling effort, data quality assurance, and any transformations required
for specific modeling techniques. Typically, the data preparation tasks
account for the majority of effort in a data mining project[37].The fourth phase,
data modeling, is the central focus of any knowledge discovery effort and
consists of the construction of models based on a variety of techniques, with
evaluations (the fifth phase) conducted for all modeling techniques. The final
step is deployment so that useful models can be embedded in information
systems to support decision-making activities[39].The next figure show the
CRISP-DM methodology knowledge discovers steps:
2.5.3 SEMMA

Statistical Analysis Software (SAS) developed a data mining analysis cycle

known by the acronym SEMMA. This acronym stands for sample, explore,
modify, model, assess. Beginning with a statistically representative sample of
your data, SEMMA intends to make it easy to apply exploratory statistical and
visualization techniques, select and transform the most significant predictive
variables, model the variables to predict outcomes, and finally confirm a
model’s accuracy. A graphic representation of SEMMA is given in Figure 2.3.
By assessing the outcome of each stage in the SEMMA process, one can
determine how to model new questions raised by the previous results, and thus
proceed back to the exploration phase for additional refinement of the data[44].

The followings are steps in SEMMA process:

1 Sample: the first step in is to create one or more data tables by sampling data
from the data warehouse. Mining a representative sample instead of the entire
volume drastically reduces the processing time required to obtain business

2 Explore: after sampling the data, the next step is to explore the data visually
or numerically for trends or groupings. Exploration helps to refine the discovery
process. Techniques such as factor analysis, correlation analysis and clustering
are often used in the discovery process.
3 Modify: modifying the data refers to creating, selecting, and transforming
one or more variables to focus the model selection process in a particular
direction, or to modify the data for clarity or consistence.

4 Model: creating a data model involves using the data mining software to
search automatically for a combination of data that predicts the desired outcome

5 Assess: the last step is to assess the model to determine how well it
performs. A common means of assessing a model is to set aside a portion of the
data during the sampling stage. If the model is valid, it should work for both the
reserved sample and for the sample that was used to develop the model[44].

The SEMMA approach is completely compatible with the CRISP approach.

Both aid the knowledge discovery process. Once models are obtained and tested,
they can then be deployed to gain value with respect to business or research
2.5.4 Hybrid Data mining
The development of both academic particularly the KDD and industrial
oriented (CRISP-DM and other) data mining models has led to the growth of
hybrid models, i.e., models that combine the features and job of both. Hybrid
model was developed mainly based on the CRISP-DM model by adopting it
to academic research. The main differences and extensions include
introducing several new explicit feedback mechanisms and in last steps
the knowledge discovered for a particular domain may be applied in other

The Hybrid DM consists of six-step Knowledge Discovery Process. According to

Cios et al[44] the description of the six steps follows:

1. Understanding of the problem domain: - This initial step involves

working closely with domain experts to define the problem and
determine the project goals, identifying key people, and learning about
current solutions to the problem. It also involves learning domain specific
2. Understanding of the data: - This step includes collecting sample data and
deciding which data, including format and size, will be needed. Background
knowledge can be used to guide these efforts. Data are checked for
completeness, redundancy, missing values, plausibility of attribute values,
etc. Finally, the step includes verification of the usefulness of the data with
respect to the DM goals.
3. Preparation of the data: - This step concerns deciding which data will be
used as input for DM methods in the subsequent step. It involves
sampling, running correlation and significance tests, and data cleaning,
which includes checking the completeness of data records, removing or
correcting for noise and missing values, etc. The end results are data that
meet the specific input requirements for the DM tools selected in Step 1.
4. Data mining: - Here the data miner uses various DM methods to
derive knowledge from preprocessed data.
5. Evaluation of the discovered knowledge: - Evaluation includes
understanding the results, checking whether the discovered knowledge is
novel and interesting, interpretation of the results by domain experts, and
checking the impact of the discovered knowledge.
6. Use of the discovered knowledge: - This final step consists of planning
where and how to use the discovered knowledge. The application area in
the current domain may be extended to other domains. A plan to monitor
the implementation of the discovered knowledge is created and the entire
project documented. Finally, the discovered knowledge is deployed.
2.6 Data mining algorithm
Data mining is utilized for the intention of finding of hidden information in a
database upon developing of model which could best fit the data. Data mining
functionalities (algorithms) are used to specify the kind of patterns to be found in
data mining tasks. The ability to extract useful knowledge hidden in the data and to
act on that knowledge is becoming increasingly important in today's competitive
world. The entire process of applying a computer based methodology including
new techniques for discovering knowledge from data is core function of data
mining. It searches for new, valuable, and nontrivial information in large volumes
of data[37]. Data mining tasks are in general classified in to two main categories
(13): predictive-oriented and descriptive oriented.

Predictive data mining tasks produce the model of the system described by the
given dataset to build a model that permits the value of unknown variable to be
predicted from the known values of other variables[45]. It is a technique that
involves using some variables or fields in the dataset to predict unknown or
previously unseen future values of other variables of interest. It is usually used to
create a model based on a set of predictors to relate the dependent variables.
Examples of predictive modeling includes classification, prediction etc.

The second category of data mining function is descriptive mining task. This is
another data mining task used to characterize the general properties of the
data in the database[37]. It produces new, nontrivial information based on
the available dataset and is to gain an understanding of the analyzed system
by uncovering patterns and relationships in large datasets. The goal of a
descriptive model is to describe all of the data or the process generating the
data(13).Examples for descriptive data mining are clustering, summarization,
association rule discovery, and sequence discovery. The followings are some of
the examples from both data mining tasks how they are working in real pattern
discovery process.
2.7 Summary of related work

Neonatal jaundice has a significant importance in neonatal morbidity and mortality

world-wide It occurs in up to 60% of term and 80%. Neonatal jaundice is a
common clinical problem worldwide. Globally, every year, about 1.1 million
babies would develop severe hyperbilirubinemia with or without bilirubin. As an
earlier report indicated that, out of the 130 million babies born per year,
approximately 4 million died within their neonatal period. Of the total neonatal
mortality secondary to jaundice complications, about 75% of neonatal mortality
occurred in South Asia and sub-Saharan Africa. In Ethiopia, neonatal mortality and
morbidity are among the highest in the world, on which more than one-third of
childhood death occurs within the first 28 days of age.

In Ethiopia, neonatal mortality and morbidity are among the highest in the world,
on which more than one-third of childhood death occurs within the first 28 days of
age. As studies revealed the incidence, etiology and risk factors of neonatal
jaundice vary according to ethnicity, economic status, and geographical differences
of countries.

A lot of studies are conducted to provide solutions for the problems; still there is
some limitation on the studies. Methodologies

1 37 Descriptive and logistic Identifying prevalence and
regression analysis. factors associated with neonatal
mortality at Ayder
Comprehensive Specialized
Hospital, Northern Ethiopia.
2 27 This study was a cross-sectional The main aim of this article was
study. mothers and infants were to determine the prevalence of
conveniently sampled after neonatal jaundice and secondly
delivery and before discharge. to explore its risk factors in
healthy term neonates.
3 39 This study followed the the purpose of this study is
different phases of the Cross to improve the diagnosis of
Industry Standard Process for neonatal jaundice with the
Data Mining model as its application of data mining
Methodology techniques.
4 40 Random sampling new born To determine the prevalence and
childes and analysis of gathered predisposing factors of Neonatal
data. Jaundice in a health-care facility
in Delta State.
5 18 cross-sectional study was So this study was aimed at
conducted assessing magnitude and
predictors of neonatal jaundice
among neonates admitted to
neonatal intensive care unit of
public hospitals in Mekelle city,
Northern Ethiopia.

6 (22) prospective cohort study was applied This study's primary aim was to build
and validate a prediction model for
severe hyperbilirubinemia using
umbilical cord blood bilirubins (CBB)
7 (25) case-control study with cross-sectional identify the possible factors associated
with neonatal jaundice and assess
maternal knowledge level of this
8 (8) Statistical analysis : Detraining the Burden of severe
Systematic review and meta-analysis neonatal jaundice defined by clinical
using meta-analytical technique. jaundice associated with clinical
outcomes including acute bilirubin
encephalopathy/ kernicterus and/or
exchange transfusion (ET) and/or
jaundice-related death.this study
recommended the study being
retrospective study did not afford the
authors opportunity to actively enquire
for the application of dusting powder
on the subject as possible cause of
neonatal jaundice (NNJ)

9 15 Systematic analysis by using by applying Determine the prevalence and

Sample means and percentages associated factors of neonatal jaundice
in Federal Medical Centre, Abakaliki,
Ebonyi State, South east Nigeria.
10 43 A facility-based cross-sectional This study aimed to determine the
study was conducted among. magnitude and associated factors
of jaundice in newborns admitted
to public hospitals in south
2.8 Empirical Findings of Literature Reviews

Study one: This study shows a high rate of neonatal mortality. Neonatal mortality
was highly associated with primipara, prematurity, low birth weight, perinatal
asphyxia, respiratory distress syndrome, congenital anomaly, neonatal sepsis and
duration of hospital stay.

Study two: determine the prevalence of neonatal jaundice and secondly to explore
its risk factors in healthy term neonates. The main findings of this study showed
that data mining techniques are important and valid approaches for the prediction
of neonatal hyperbilirubinemia.

Study three: determining the prevalence and predisposing factors of Neonatal

Jaundice in a health-care facility in Delta State were checked every day between
June 2009 and June 2010 for signs of jaundice.

Study FOURE: found out The babies in the study facility frequently suffered from
neonatal jaundice, which was more common in moms with lower educational
levels and in infants whose parents had separated.272 babies (aged 1-30 days) the
Neonatal clinic of the Department of Child health, Central Hospital ,Warri, Delta
State between 2009, The moms' socio demographic information was evaluated
through the use of a semi-structured questionnaire. The produced data were
Random sampling new born childes and analysis of gathered data.

Study five: The magnitude of neonatal jaundice among neonates was found to be
high. Duration of labor, time of delivery, sexes of neonate, sepsis, maternal blood
group, and blood type incompatibility were significantly associated with neonatal
jaundice. Therefore, improving newborn care and timely intervention for neonates
with ABO/Rh incompatibility are recommended.
Study six: This study's primary aim was to build and validate a prediction
model for severe hyperbilirubinemia using umbilical cord blood bilirubins
(CBB). The study considered combination umbilical cord blood bilirubins
(CBB) with gestational age and maternal race predict neonatal hyperbilirubinemia
by applying prospective cohort study was applied.
Study Seven: Low neonatal birth weight and prolonged duration of labour are
associated with neonatal jaundice. Mothers had inadequate knowledge of
neonatal jaundice and its causes. Therefore, during routine prenatal visits,
healthcare professionals should focus primarily on providing more education about
the illness and its causes.

2.9 Reassert Gap

For # 1 This study had limitations because the study assessed on small sample and
a significant number of sample charts were incomplete. This study was done in a
single center as a result; the prevalence may not reflect the overall prevalence in
the community. In addition, as a cross-sectional study design, this study does not
show cause-and-effect relationships.

For # 2 As a limitation, this study was not considered some essential predictors,
like a thyroid-stimulating hormone, and glucose-6-phosphatase dehydrogenase. A
limitation of the study is that convenience sampling was used, which contributed to
a small study sample and the high percentage of babies born via C-section. The
low numbers of participants with certain risk factors (e.g. smoking or alcohol use)
made it difficult to investigate associations. Data on the gravidity of the mothers
and the gender of the babies were not collected, which were identified as risk
factors in some studies.

For #3 bigger sample could have improved the result (used 70 variables) , and the
study considered limited risk factors associated ( father and mother information,
siblings information, gestational information, delivery information ,clinical
information of the complete hospital stay).

For # 4 the limitation this study is its limited to small no sample and the sample is
only limited to health care center delivery , the gathered is analyzed without any
tool , with limited variable analysis.
For # 5 In this study a total of 209 neonates, which is small no and its considering
babies born in intensive care unit of public hospitals. The study does not
considered babies born at home and come to health center.

For#6 This is a single site study, and further validation in a larger, multi-site study
is warranted. The study considered only combination umbilical cord blood
bilirubins (CBB) with gestational age and maternal race predict neonatal

For #7 the sample data is only limited to One hundred and fifty (150) neonates
comprising 100 with clinically evident jaundice and 50 without jaundice were
conveniently recruited from the Trauma and Specialist Hospital in the Effutu
Municipality. Blood samples were collected for the determination of few risk
factors of serum bilirubin, glucose-6-phosphate dehydrogenase (G6PD), status and
blood group (ABO and Rhesus).

For #8 study being retrospective study did not afford the authors opportunity to
actively enquire for the application of dusting powder on the subject as possible
cause of neonatal jaundice (NNJ)

The amount of data used by the second/ researchers was very small and the data
collection is limited to small numbers of patients, which are born in time of data
collection, previously bourn jaundice patients dataset is not analyzed.

The study of the second algorithm used only the variables gestational age and
newborn blood group (ABO).

Even though their various factors for jaundice disease, there is limitation of
explaining factors related to jaundice. Etiology and risk factors of neonatal
jaundice vary according to ethnicity, economic status, and geographical differences
of countries.

Most of the studies are conducted by using cross-sectional study and systematic
data analysis. The studies are conduct on classification decision tree algorithms,
other types classification algorithms where not used for comparative algorithm
analysis by the researchers.
To my knowledge, there is no study conducted that used data mining to predict
jaundice on factors responsible for the occurrence.

Therefore, this study will apply data mining techniques for predicting the jaundice
status of newborns. Specifically, identify the determinant attributes of jaundice
status of newborn babies, build best prediction model.

1. Mitra, S. and J. Rennie, Neonatal jaundice: aetiology, diagnosis and treatment. British Journal of
Hospital Medicine, 2017. 78(12): p. 699-704.
2. Kleigman, B., Jenson. Stanton Saunders International edition, Ed 18th, 2008: p. p2666.
3. Cohen, R.S., R.J. Wong, and D.K. Stevenson, Understanding neonatal jaundice: a perspective on
causation. Pediatrics & Neonatology, 2010. 51(3): p. 143-148.
4. Mwaniki, M.K., et al., Long-term neurodevelopmental outcomes after intrauterine and neonatal
insults: a systematic review. The Lancet, 2012. 379(9814): p. 445-452.
5. Olusanya, B.O., S. Teeple, and N.J. Kassebaum, The contribution of neonatal jaundice to global
child mortality: findings from the GBD 2016 study. Pediatrics, 2018. 141(2).
6. Lawn, J.E., et al., Every Newborn: progress, priorities, and potential beyond survival. The lancet,
2014. 384(9938): p. 189-205.
7. Olusanya, B.O., T.A. Ogunlesi, and T.M. Slusher, Why is kernicterus still a major cause of death
and disability in low-income and middle-income countries? Archives of disease in childhood,
2014. 99(12): p. 1117-1121.
8. Slusher, T.M., et al., Burden of severe neonatal jaundice: a systematic review and meta-analysis.
BMJ paediatrics open, 2017. 1(1).
9. Greco, C., et al., Diagnostic performance analysis of the point-of-care bilistick system in
identifying severe neonatal hyperbilirubinemia by a multi-country approach. EClinicalMedicine,
2018. 1: p. 14-20.
10. Sreedha, B., P.R. Nair, and R. Maity, Non-invasive early diagnosis of jaundice with computer
vision. Procedia Computer Science, 2023. 218: p. 1321-1334.
11. Bhutani, V., R. Vilms, and L. Hamerman-Johnson, Universal bilirubin screening for severe
neonatal hyperbilirubinemia. Journal of perinatology, 2010. 30(1): p. S6-S15.
12. Maisels, M.J. Screening and early postnatal management strategies to prevent hazardous
hyperbilirubinemia in newborns of 35 or more weeks of gestation. in Seminars in fetal and
neonatal medicine. 2010. Elsevier.
13. Vodret, S., et al., Attenuation of neuro-inflammation improves survival and neurodegeneration in
a mouse model of severe neonatal hyperbilirubinemia. Brain, behavior, and immunity, 2018. 70:
p. 166-178.
14. Bhutani, V.K., et al., Neonatal hyperbilirubinemia and Rhesus disease of the newborn: incidence
and impairment estimates for 2010 at regional and global levels. Pediatric research, 2013. 74(1):
p. 86-100.
15. Onyearugha, C., B. Onyire, and H. Ugboma, Neonatal jaundice: Prevalence and associated
factors as seen in Federal medical centre Abakaliki, Southeast Nigeria. J Clin Med Res, 2011. 3(3):
p. 40-45.
16. Tette, E.M., et al., The pattern of neonatal admissions and mortality at a regional and district
hospital in the Upper West Region of Ghana; a cross sectional study. PloS one, 2020. 15(5): p.
17. Greco, C., et al., Neonatal jaundice in low-and middle-income countries: lessons and future
directions from the 2015 Don Ostrow Trieste Yellow Retreat. Neonatology, 2016. 110(3): p. 172-
18. Tewabe, T., et al., Neonatal mortality in the case of Felege Hiwot referral hospital, Bahir Dar,
Amhara Regional State, North West Ethiopia 2016: a one year retrospective chart review. Italian
journal of pediatrics, 2018. 44: p. 1-5.
19. Yismaw, A.E. and A.A. Tarekegn, Proportion and factors of death among preterm neonates
admitted in University of Gondar comprehensive specialized hospital neonatal intensive care
unit, Northwest Ethiopia. BMC research notes, 2018. 11: p. 1-7.
20. Demography, E., Health Survey: Addis Ababa. Ethiopia and Rockville, Maryland, USA: Central
statistics agency and ICF. EDHS, 2016.
21. Castillo, A., et al., Umbilical cord blood bilirubins, gestational age, and maternal race predict
neonatal hyperbilirubinemia. PLoS One, 2018. 13(6): p. e0197888.
22. Scrafford, C.G., et al., Incidence of and risk factors for neonatal jaundice among newborns in
southern N epal. Tropical Medicine & International Health, 2013. 18(11): p. 1317-1328.
23. Garosi, E., F. Mohammadi, and F. Ranjkesh, The relationship between neonatal jaundice and
maternal and neonatal factors. Iranian Journal of Neonatology, 2016. 7(1): p. 37-40.
24. Omekwe, D.E., et al., Survey and management outcome of neonatal jaundice from a developing
tertiary health centre, Southern Nigeria. IOSR Journal of Dental and Medical Sciences, 2014.
13(4): p. 35-39.
25. Adoba, P., et al., Knowledge level and determinants of neonatal jaundice: a cross-sectional study
in the Effutu Municipality of Ghana. International journal of pediatrics, 2018. 2018.
26. Birhanu, M.Y., et al., Rate and predictors of neonatal jaundice in northwest Ethiopia: prospective
cohort study. Journal of Multidisciplinary Healthcare, 2021: p. 447-457.
27. Brits, H., et al., The prevalence of neonatal jaundice and risk factors in healthy term neonates at
National District Hospital in Bloemfontein. African Journal of Primary Health Care and Family
Medicine, 2018. 10(1): p. 1-6.
28. Kavehmanesh, Z., et al., Prevalence of readmission for hyperbilirubinemia in healthy newborns.
29. Tavakolizadeh, R., et al., Maternal risk factors for neonatal jaundice: a hospital-based cross-
sectional study in Tehran. European journal of translational myology, 2018. 28(3).
30. Khedmat, L., S.Y. Mojtahedi, and A. Moienafshar, Recent clinical evidence in the herbal therapy
of neonatal jaundice in Iran: A review. Journal of Herbal Medicine, 2021. 29: p. 100457.
31. Olusanya, B.O., F.B. Osibanjo, and T.M. Slusher, Risk factors for severe neonatal
hyperbilirubinemia in low and middle-income countries: a systematic review and meta-analysis.
PloS one, 2015. 10(2): p. e0117229.
32. Ogunlesi, T.A. and O.B. Ogunfowora, Predictors of acute bilirubin encephalopathy among
Nigerian term babies with moderate-to-severe hyperbilirubinaemia. Journal of tropical
pediatrics, 2011. 57(2): p. 80-86.
33. Lake, E.A., et al., Magnitude of neonatal jaundice and its associated factor in neonatal intensive
care units of Mekelle city public hospitals, Northern Ethiopia. International journal of pediatrics,
2019. 2019.
34. Fanello, C., et al., Prevalence and Risk Factors of Neonatal Hyperbilirubinemia in a Semi-Rural
Area of the Democratic Republic of Congo: A Cohort Study. The American Journal of Tropical
Medicine and Hygiene, 2023. 109(4): p. 965.
35. Hansen, T.W.R., Narrative review of the epidemiology of neonatal jaundice. Pediatric Medicine,
2021. 4.
36. Ip, S., et al., An evidence-based review of important issues concerning neonatal
hyperbilirubinemia. Pediatrics, 2004. 114(1): p. e130-e153.
37. Kantardzic, M., Data mining: concepts, models, methods and algorithms, A John Wiley & Sons.
Inc. Hoboken, New Jersey, 2011.
38. Han, J., M. Kamber, and D. Mining, Concepts and techniques. Morgan Kaufmann, 2006. 340: p.
39. Sumathi, S. and S. Sivanandam, Evolution and Scaling of Data Mining Algorithms. Introduction to
Data Mining and its Applications, 2006: p. 151-164.
40. Witten, I.H., et al. Practical machine learning tools and techniques. in Data mining. 2005. Elsevier
Amsterdam, The Netherlands.
41. Adeniyi, D.A., Z. Wei, and Y. Yongquan, Automated web usage data mining and recommendation
system using K-Nearest Neighbor (KNN) classification method. Applied Computing and
Informatics, 2016. 12(1): p. 90-108.
42. Cios, K.J., et al., Text Mining. Data Mining: A Knowledge Discovery Approach, 2007: p. 453-465.
43. Kantardzic, M.M. and J. Gant. Mining Sequences in Distributed Sensors Data for Energy
Production. in FLAIRS. 2007.
44. Larose, D.T., An introduction to data mining. Traduction et adaptation de Thierry Vallaud, 2005.
45. Niakšu, O. and O. Kurasova, Data mining applications in healthcare: research vs practice.
Databases Inf. Syst. BalticDB&IS, 2012. 58: p. 2012.

You might also like