Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9

CHRONIC LIVER DISEASE PREDICTION USING MACHINE LEARNING

Abstract— Liver disease prediction using machine learning Early detection and accurate classification of liver diseases
(ML) techniques has emerged as a crucial area of research can lead to better patient outcomes and reduce the burden on
aimed at enhancing early detection and management of liver- the healthcare system. One-third of adults and an increasing
related disorders. This study proposes a novel approach that proportion of youngsters in affluent nations suffer from non-
leverages ML algorithms to predict the likelihood of liver alcoholic fatty liver disease (NAFLD) [5], a growing health
disease onset in individuals. Through the analysis of diverse issue. The abnormal buildup of triglycerides in the liver,
patient data, including demographic information, clinical which in some people causes an
history, laboratory test results, and imaging studies, predictive inflammatory reaction that can lead to cirrhosis and liver
models are developed to identify patterns and risk factors cancer, is the first sign of the condition. While there is a
associated with liver diseases. Various ML algorithms, such significant correlation between obesity, insulin resistance, and
as logistic regression, decision trees, support vector machines, non-alcoholic fatty liver disease (NAFLD), the
and ensemble methods, are explored and compared to pathophysiology of NAFLD remains poorly un- derstood, and
determine the most accurate and reliable predictive model. treatment options are limited. However, machine learning
Additionally, feature selection and dimensionality reduction techniques have demonstrated encouraging results in
techniques are employed to enhance model performance and predicting and categorizing liver diseases based on patient
interpretability. The proposed ML-based approach offers data. By utilizing sophisticated algorithms to analyze and
promising results in terms of prediction accuracy, sensitivity, learn from large datasets, these techniques can identify
and specificity, demonstrating its potential as a valuable tool patterns and anticipate outcomes. The employment of
for healthcare professionals in early diagnosis and risk machine learning techniques in liver disease prediction and
stratification of liver diseases. Ultimately, this research classification is a dynamic area of research, with continual
contributes to advancing personalized medicine and advancements being made to enhance accuracy and decrease
improving patient outcomes in the management of liver- healthcare costs.Liver disease remains a significant health
related disorders. burden worldwide, encompassing a spectrum of conditions
Keywords—Machine learning algorithms; classification ranging from benign fatty liver disease to life-threatening
model; classifier; liver disease cirrhosis and hepatocellular carcinoma. Early detection and
intervention are paramount for improving patient outcomes
INTRODUCTION and reducing morbidity and mortality associated with liver
Liver disease is a significant global health concern, disorders. In recent years, the advent of machine learning
encompassing a wide range of conditions such as fatty liver (ML) techniques has revolutionized healthcare by providing
disease, hepatitis, cirrhosis, and liver cancer. Early detection powerful tools for predictive modeling and risk assessment.
and timely intervention are crucial for effective management ML algorithms can analyze vast amounts of heterogeneous
and improved patient outcomes. In recent years, the data, including patient demographics, clinical history,
application of machine learning (ML) techniques in laboratory tests, and imaging findings, to identify patterns and
healthcare has gained traction, offering predict disease outcomes. Leveraging these advancements,
the potential to enhance diagnostic accuracy and prognostic researchers and clinicians are increasingly exploring ML-
capabilities. ML algorithms can analyze large volumes of based approaches for liver disease prediction, with the
patient data, including demographic information, clinical potential to enhance diagnostic accuracy, enable personalized
parameters, laboratory tests, and imaging studies, to identify medicine, and optimize resource allocation in healthcare
patterns and predict the risk of liver disease development. By settings. This study aims to review and critically evaluate the
leveraging these computational methods, healthcare providers current landscape of ML-based liver disease prediction
can develop personalized risk assessment models and tailor models, highlighting their strengths, limitations, and future
interventions to individuals at higher risk. This study aims to directions in improving clinical decision-making and patient
explore the utility of ML algorithms in liver disease care.
prediction, offering insights into the potential benefits of
predictive modeling in improving the early detection and OVERVIEW OF LIVER DISEASE
management of liver-related disorders.Liver disease is a Liver disease prediction using machine learning (ML)
significant health issue affecting millions of people globally. techniques involves the development and application of
predictive models to identify individuals at risk of developing cholestasis and progressive liver fibrosis. It primarily affects
liver-related disorders. This process begins with the collection middle-aged women and is diagnosed based on the presence
and preprocessing of diverse datasets containing patient of specific autoantibodies and characteristic histological
demographics, clinical information, laboratory test results, findings on liver biopsy.
imaging findings, and other relevant variables. ML
algorithms are then trained on these datasets to learn patterns INHERITANCE: Liver diseases due to inheritance
and relationships between various features and the presence encompass a wide range of conditions caused by genetic
or progression of liver disease. Common ML techniques mutations or abnormalities that are passed down from one
employed in liver disease prediction include logistic generation to the next. These inherited liver diseases can
regression, decision trees, support vector machines, random manifest in various ways, affecting liver function, structure,
forests, and deep learning approaches. Feature selection and metabolism, or bile flow. Some of the most common
dimensionality reduction methods are often utilized to inherited liver diseases include:
optimize model performance and interpretability. Once Hemochromatosis: This is a genetic disorder characterized by
trained, the predictive models are validated on independent excessive accumulation of iron in the liver and other organs.
datasets to assess their accuracy, sensitivity, specificity, and It can lead to liver damage, cirrhosis, and an increased risk of
generalizability. The ultimate goal of liver disease prediction liver cancer. Hemochromatosis is typically caused by
using ML is to enable early identification of individuals at mutations in the HFE gene.
risk, facilitate targeted interventions, and improve patient Wilson's disease: This is an autosomal recessive disorder
outcomes by preventing disease progression and characterized by abnormal copper metabolism, leading to
complications. This overview highlights the multifaceted copper accumulation in the liver, brain, and other organs. In
approach involved in ML-based liver disease prediction and the liver, copper overload can cause inflammation, fibrosis,
underscores its potential to revolutionize clinical practice and and eventually cirrhosis. Wilson's disease is caused by
public health initiatives related to liver health.. mutations in the ATP7B gene.

CAUSES OF LIVER DISEASE: CANCER AND LIVER PROGRESSIONS: Liver disease


INFECTION: Liver disease due to infection encompasses a due to cancer refers to malignancies that originate in the liver
range of conditions caused by various pathogens, including or metastasize to the liver from other primary sites. The most
viruses, bacteria, parasites, and fungi, that affect the liver's common type of liver cancer is hepatocellular carcinoma
structure and function. Viral hepatitis, caused by hepatitis (HCC), which arises from hepatocytes, the main cell type in
viruses such as hepatitis A, B, C, D, and E, is one of the most the liver. Other less common types include intrahepatic
common causes of liver disease globally. Hepatitis viruses cholangiocarcinoma (bile duct cancer) and angiosarcoma
can lead to acute or chronic inflammation of the liver, with (blood vessel cancer).
potential complications such as cirrhosis and liver
cancer.Other infectious agents, such as bacteria like
Escherichia coli, Salmonella, and Streptococcus, can cause CHEMICAL COMPOUNDS IN LIVER
liver infections such as bacterial hepatitis, abscesses, and Chemicals such as Bilirubin, Albumin, Alkaline phos-
cholangitis. Parasitic infections such as schistosomiasis, phatase, Aspartate aminotransferase, and globulin are existent
caused by Schistosoma parasites, and amoebic liver abscess, in the liver and perform a vital role in the daily operations of
caused by Entamoeba histolytica, can also affect the liver, the healthy liver.
leading to inflammation, abscess formation, and
hepatomegaly. Bilirubin: Bilirubin is a yellow pigment derived from the
breakdown of heme, a component of hemoglobin from old
IMMUNE SYSTEM ABNORMALITY : Liver diseases red blood cells. It is processed by the liver and excreted into
caused by immune system abnormalities are collectively bile, eventually being eliminated from the body in feces.
known as autoimmune liver diseases (AILDs). These Elevated levels of bilirubin can indicate liver dysfunction or
conditions occur when the body's immune system mistakenly bile flow obstruction.
attacks healthy liver cells, leading to inflammation, damage, Bile Acids: Bile acids are produced by the liver and stored in
and dysfunction of the liver. AILDs encompass several the gallbladder. They aid in the digestion and absorption of
distinct disorders, including autoimmune hepatitis (AIH), dietary fats by emulsifying lipids in the intestine. Bile acids
primary biliary cholangitis (PBC), and primary sclerosing are reabsorbed in the ileum and recycled back to the liver in a
cholangitis (PSC).Autoimmune hepatitis (AIH) is process known as enterohepatic circulation.
characterized by chronic inflammation of the liver, often
resulting in progressive liver damage and fibrosis. It Glucose: The liver plays a central role in glucose
predominantly affects women and is typically diagnosed metabolism, regulating blood glucose levels through
based on elevated liver enzyme levels, presence of glycogenolysis (breakdown of glycogen) and
autoantibodies, and histological evidence of interface gluconeogenesis (synthesis of glucose from non-carbohydrate
hepatitis on liver biopsy. sources). It also stores excess glucose as glycogen for later
Primary biliary cholangitis (PBC), formerly known as use.
primary biliary cirrhosis, is characterized by autoimmune
destruction of the bile ducts within the liver, leading to
Lipids: The liver is involved in lipid metabolism, including sensitivity, and specificity in distinguishing between different
synthesis of cholesterol, triglycerides, and phospholipids, as disease stages, predicting disease progression, and identifying
well as oxidation of fatty acids for energy production. It also individuals at higher risk of developing complications such as
secretes lipoproteins such as very-low-density lipoprotein cirrhosis and hepatocellular carcinoma.Furthermore, machine
(VLDL) and high-density lipoprotein (HDL) involved in lipid learning techniques offer the potential for personalized
transport. medicine by tailoring treatment strategies based on individual
Drugs and Xenobiotics: The liver is responsible for patient characteristics and disease profiles. By analyzing
metabolizing and detoxifying drugs, environmental toxins, large-scale datasets and identifying predictive biomarkers,
and xenobiotics through a series of enzymatic reactions machine learning algorithms can help optimize therapeutic
collectively known as drug metabolism. Cytochrome P450 interventions, monitor treatment response, and guide clinical
enzymes, located primarily in hepatocytes, play a crucial role decision-making in liver disease management.Overall,
in drug metabolism and detoxification. machine learning-based classification holds great promise in
advancing liver disease diagnosis, prognosis, and treatment,
Proteins: The liver synthesizes various proteins, including paving the way for more accurate and personalized healthcare
albumin, which helps maintain colloidal osmotic pressure and approaches in hepatology. Continued research and
transports substances in the blood; clotting factors such as development in this field are essential to further refine
fibrinogen, prothrombin, and factors VII, VIII, IX, and X predictive models, validate their clinical utility, and integrate
involved in coagulation; and other proteins such as them into routine practice for the benefit of patients with liver
complement proteins and acute-phase reactants. diseases.

Ammonia: Ammonia is a byproduct of protein metabolism


and microbial fermentation in the intestines. The liver LOGISTIC REGRESSION
converts ammonia into urea, which is excreted by the kidneys Logistic regression, a fundamental machine learning
in urine. Hyperammonemia, resulting from liver dysfunction, algorithm, has found significant application in the
can lead to hepatic encephalopathy, a neurological condition classification of liver diseases. In liver disease diagnosis,
characterized by confusion, coma, and potentially life- logistic regression is employed to predict the probability of a
threatening complications. patient having a particular condition based on a set of input
variables. These variables may include demographic
information, clinical parameters such as liver function tests
MACHINE LEARNING AND LIVER DISEASE and imaging findings, as well as genetic markers or molecular
CLASSIFICATION profiling data.
Machine learning techniques have emerged as powerful tools One of the key advantages of logistic regression is its
for the classification and prediction of liver diseases, offering simplicity and interpretability, making it well-suited for
the potential to enhance diagnostic accuracy and patient care. modeling binary outcomes such as the presence or absence of
In recent years, various machine learning algorithms, liver disease. By estimating the probability of disease
including logistic regression, decision trees, support vector occurrence, logistic regression provides clinicians with
machines, random forests, and deep learning approaches, valuable insights into the likelihood of a patient having a
have been applied to analyze diverse datasets containing particular liver condition, aiding in diagnostic decision-
patient demographics, clinical parameters, laboratory test making.
results, imaging findings, and genetic information. These Moreover, logistic regression can handle both continuous and
algorithms can identify patterns and relationships between categorical input variables, allowing for the incorporation of
different features, enabling the development of predictive diverse types of data into the predictive model. For instance,
models for liver disease classification.One of the significant in hepatocellular carcinoma (HCC) classification, logistic
advantages of machine learning-based classification is its regression models can utilize a combination of clinical
ability to handle complex and high-dimensional data, features such as patient age, gender, and serum alpha-
allowing for the integration of multiple types of information fetoprotein levels, along with imaging characteristics such as
to improve diagnostic accuracy. For example, in tumor size and vascular invasion status, to predict the
hepatocellular carcinoma (HCC) classification, machine likelihood of HCC presence.
learning models can incorporate clinical features such as age,
gender, and liver function tests, as well as imaging
characteristics such as tumor size, morphology, and CONVOLUTIONAL NEURAL NETWORK
enhancement patterns from computed tomography (CT) or Convolutional Neural Networks (CNNs) have emerged as
magnetic resonance imaging (MRI). Moreover, genetic powerful tools in the realm of liver disease classification,
markers and molecular profiling data can be integrated into particularly in the analysis of medical images such as CT
predictive models to further refine risk stratification and scans, MRI images, and histopathological slides. CNNs are
treatment selection.Machine learning-based classification well-suited for capturing intricate spatial patterns and features
systems have demonstrated promising results in various liver within images, making them highly effective in identifying
diseases, including HCC, non-alcoholic fatty liver disease subtle abnormalities indicative of liver diseases.
(NAFLD), hepatitis B and C, autoimmune liver diseases, and In liver disease diagnosis, CNNs are applied to analyze
liver fibrosis. These models have shown high accuracy, medical images and extract relevant features that are
indicative of different liver conditions. For example, in
hepatocellular carcinoma (HCC) detection, CNNs can
identify characteristic imaging features such as arterial
enhancement, washout appearance, and capsule appearance,
which are crucial for distinguishing HCC lesions from benign
liver nodules.
One of the key strengths of CNNs lies in their ability to
automatically learn hierarchical representations of data,
without the need for manual feature engineering. Through the
use of convolutional layers, pooling layers, and nonlinear

activation functions, CNNs can effectively extract and


hierarchically combine features at different spatial scales,
enabling them to Moreover, machine learning models can be trained on large-
scale datasets containing labeled examples of liver disease
capture complex patterns within medical images. cases and healthy controls, allowing them to learn from
diverse examples and generalize well to unseen data. Cross-
PREDICTING THE ACCURACY OF LIVER validation techniques are commonly employed to assess
DISEASE USING MACHINE LEARNING model performance and ensure robustness.
Predicting the accuracy of liver disease diagnosis using Additionally, machine learning models can be further
machine learning techniques has become increasingly validated on independent datasets to evaluate their accuracy,
prevalent and promising in recent years. By leveraging sensitivity, specificity, and overall predictive performance.
diverse datasets containing patient demographics, clinical By comparing model predictions with clinical outcomes,
parameters, laboratory test results, imaging findings, and researchers and clinicians can assess the reliability and utility
genetic information, machine learning models can learn of machine learning-based liver disease prediction models in
intricate patterns and relationships associated with different real-world settings.
liver conditions. These models aim to predict the likelihood Overall, predicting the accuracy of liver disease diagnosis
of liver disease presence or progression, enabling clinicians to using machine learning represents a promising avenue for
make more informed diagnostic and prognostic decisions. improving diagnostic accuracy and patient care in hepatology.
Various machine learning algorithms, including logistic As research in this field continues to advance, the
regression, decision trees, support vector machines, random development of more accurate and reliable machine learning
forests, and deep learning approaches, have been applied to models holds the potential to revolutionize liver disease
liver disease prediction tasks. These algorithms can handle diagnosis and management, ultimately leading to better
complex and high-dimensional data, allowing for the patient outcomes and healthcare delivery.
integration of multiple types of information to improve
diagnostic accuracy. For example, in hepatocellular PROPOSED SYSTEM
carcinoma (HCC) prediction, machine learning models can
incorporate clinical features such as patient age, gender, liver The proposed system for liver disease prediction using
function tests, and imaging characteristics, along with genetic machine learning (ML) aims to revolutionize diagnostic
markers or molecular profiling data, to predict the likelihood approaches in hepatology by leveraging advanced
of HCC occurrence. computational techniques to improve accuracy, efficiency,
and personalized patient care. The system integrates diverse
datasets containing demographic information, clinical
parameters, laboratory test results, imaging findings, and
genetic markers to develop robust predictive models for liver
disease detection and prognosis.
The cornerstone of the proposed system lies in the utilization
of various ML algorithms, including logistic regression,
decision trees, support vector machines, random forests, and
neural networks. These algorithms are trained on large-scale
datasets to learn intricate patterns and relationships associated easier to interpret.
with different liver conditions, enabling accurate
classification of individuals into disease and non-disease Random Forest feature selection – Feature selection
categories. Ensemble methods such as random forests and using Random forest provides highly accurate, low
gradient boosting machines further enhance predictive overfitting and easy interpretability by deriving the
performance by combining multiple models to mitigate importance of each feature on the decision tree.
overfitting and improve generalization. Random forest chooses features at random from
decision trees created by extractingobservations from
DATA COLLECTION the dataset at random [14].
Selection of data is essential for selecting significant
records for the analysis and to obtain productive or RESULTS
constructive knowledge by performing various data
mining technique In liver disease prediction using machine learning (ML), the
results obtained from the predictive models play a crucial role
DATA EXPLORATION in evaluating the effectiveness and accuracy of the system.
Data exploration is an early step in data analysis that After training the ML algorithms on diverse datasets
is used to summarise data for analysis and then to containing patient information, the performance of the models
observe initial patterns of data and features. To identify is assessed using various metrics such as accuracy,
highly linearly dependent features, several display sensitivity, specificity, and area under the receiver operating
approaches such as histogram and boxplot are utilised to characteristic curve (AUROC).The results of the ML-based
the extreme and outlier values feature correlation values. liver disease prediction models demonstrate promising
DATA PREPROCESSING performance in accurately classifying individuals into disease
and non-disease categories. High accuracy rates, typically
ranging from 80% to 90% or higher, indicate the system's
Imputation of Missing Values - This technique is used for
obtaining the missing values from the data and imputating the ability to correctly identify patients with liver disease and
null values with the median. In the Indian liver disease those without. Sensitivity measures the proportion of true
patients dataset, there are four missing values for Albumin positives correctly identified by the model, while specificity
and Globulin ratio that has been restored by median values measures the proportion of true negatives accurately
[11]. classified. High sensitivity and specificity values signify the
system's ability to minimize false positives and false
 Dummy Encoding – Dummy encoding is a negatives, thereby enhancing diagnostic reliability
method of transforming the categorical variable to
numerical variable as most of the machine
CONCLUSION
algorithms are designed to work on numerical data.
In conclusion, liver disease prediction using machine learning
For each of the categorical variable, k-1 numerical
variables are created. (ML) holds significant promise for revolutionizing diagnostic
approaches in hepatology. By leveraging advanced
 Elimination of Duplicate Values – It is necessary computational techniques and diverse datasets containing
to discard the redundant values from the data, in patient information, ML algorithms have demonstrated
order to improve the efficiency and quality of the remarkable capabilities in accurately predicting the presence
data [12]. or progression of liver diseases. The development of
 Outlier Detection and Elimination- Outliers are predictive models using algorithms such as logistic
exceptional values that differ from the remainder regression, decision trees, support vector machines, random
of the results due to minor measurement or forests, and neural networks has enabled clinicians to identify
experimental error. Outliers are divided into two individuals at risk of liver disease with high accuracy,
categories: univariate and multivariate outliers. sensitivity, and specificity. These models offer valuable
Whereas a single feature is considered in a insights into the complex interplay of demographic, clinical,
univariate outlier, a multivariate outlier considers and genetic factors associated with liver disease risk,
n-dimensions of ILPD data features or attributes. facilitating early detection, risk stratification, and
 Resampling- The linear dataset is unbalanced, personalized treatment planning. Moreover, ML-based
with the majority of patients suffering from liver approaches have the potential to streamline clinical workflow,
illness and a small number of non-linear enhance diagnostic efficiency, and improve patient outcomes
individuals. SMOTE is used to balance the data by in hepatology. As research in ML continues to evolve and
synthesising additional samples for the minority datasets grow larger and more diverse, further advancements
class [13]. in predictive modeling and validation in real-world clinical
settings are expected to enhance the reliability and utility of
FEATURE SELECTION ML-based liver disease prediction systems, ultimately leading
The process of limiting the number of input variables to improved healthcare delivery and public health outcomes
in order for the machine learning algorithm to train the
model more faster is known as feature selection. It
reduces the computational complexity and makes it
REFERENCES
FUTURE ENHANCEMENT
A. Arjmand, C. T. Angelis, A. T. Tzallas, M. G. Tsipouras,
E. Glavas,
R. Forlano, P. Manousou, and N. Giannakeas, “Deep learning
. in liver biopsies using convolutional neural networks,” in
2019 42nd Interna- tional Conference on
Telecommunications and Signal Processing (TSP). IEEE,
2019, pp. 496–499.
L. A. Auxilia, “Accuracy prediction using machine learning
techniques for indian patient liver disease,” in 2018 2nd
International Conference on Trends in Electronics and
Informatics (ICOEI). IEEE, 2018, pp. 45–50.
A. Spann, A. Yasodhara, J. Kang, K. Watt, B. Wang, A.
Goldenberg, and M. Bhat, “Applying machine learning in
liver disease and trans- plantation: a comprehensive review,”
Hepatology, vol. 71, no. 3, pp. 1093–1105, 2020.
S. Sontakke, J. Lohokare, and R. Dani, “Diagnosis of liver
diseases using machine learning,” in 2017 International
Conference on Emerging Trends & Innovation in ICT
(ICEI). IEEE, 2017, pp. 129–133.
J. C. Cohen, J. D. Horton, and H. H. Hobbs, “Human fatty
liver disease: old questions and new insights,” Science, vol.
332, no. 6037, pp. 1519– 1523, 2011.
F. Himmah, R. Sigit, and T. Harsono, “Segmentation of liver
using abdominal ct scan to detection liver desease area,” in
2018 Interna- tional Electronics Symposium on Knowledge
Creation and Intelligent Computing (IES-KCIC). IEEE,
2018, pp. 225–228.
M. B. Priya, P. L. Juliet, and P. Tamilselvi, “Performance
analysis of liver disease prediction using machine learning
algorithms,” Interna- tional Research Journal of Engineering
and Technology (IRJET), vol. 5, no. 1, pp. 206–211, 2018.
T. R. Baitharu and S. K. Pani, “Analysis of data mining
techniques for healthcare decision support system using liver
disorder dataset,” Procedia Computer Science, vol. 85, pp.
862–870, 2016.
U. R. Acharya, S. V. Sree, R. Ribeiro, G. Krishnamurthi, R. T.
Marinho,
J. Sanches, and J. S. Suri, “Data mining framework for fatty
liver disease classification in ultrasound: a hybrid feature
extraction paradigm,” Medical physics, vol. 39, no. 7Part1,
pp. 4255–4264, 2012.
N. Nahar and F. Ara, “Liver disease prediction by using
different decision tree techniques,” International Journal of
Data Mining & Knowledge Management Process, vol. 8, no.
2, pp. 01–09, 2018.
A. Naik and L. Samant, “Correlation review of classification
algorithm using data mining tool: Weka, rapidminer, tanagra,
orange and knime,” Procedia Computer Science, vol. 85, pp.
662–668, 2016.
A. N. Arbain and B. Y. P. Balakrishnan, “A comparison of
data mining algorithms for liver disease prediction on
imbalanced data,” International Journal of Data Science and
Advanced Analytics (ISSN 2563-4429), vol. 1, no. 1, pp. 1–
11, 2019.
M. A. Kuzhippallil, C. Joseph, and A. Kannan, “Comparative
analysis of machine learning techniques for indian liver
disease patients,” in 2020 6th International Conference on
Advanced Computing and Communication Systems
(ICACCS). IEEE, 2020, pp. 778–782.
K. R. Asish, A. Gupta, A. Kumar, A. Mason, M. K.
Enduri, and
S. Anamalamudi, “A tool for fake news detection using
machine learn-

.
A.

You might also like