REVIEW

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 27

COIMBATORE INSTITUTE OF TECHNOLOGY

(AN AUTONOMOUS INSTITUTION)


COIMBATORE – 641014

A MACHINE LEARNING METHODOLOGY FOR DETECTING CHRONIC


KIDNEY DISEASE

PROJECT GUIDE: MEMBERS:


MR.C.MURALE 1807028-KAVINKUMAR P
(ASSISTANT PROFESSOR) 1807030-MANOJ G
DEPARTMENT OF INFORMATION TECHNOLOGY 1807052-SURYA P
COIMBATORE INSTITUTE OF TECHNOLOGY
OBJECTIVE

• TO DIAGNOSIS THE CHRONIC KIDNEY DISEASE IN PEOPLE.


• EARLY DETECTION OF CKD ENABLES PATIENTS TO RECEIVE
TIMELY TREATMENT TO DECREASE THE PROGRESSION OF THIS
DISEASE.
• MACHINE LEARNING MODELS CAN EFFECTIVELY HELP
CLINICIANS IN DIAGNOSING THE CKD.
• TO IDENTIFY THE PRESENCE OF THE DISEASE EARLIER WHICH
INCREASES THE CHANCE OF DIAGNOSING THE CKD.
MOTIVATION

• IN ORDER TO PREVENT THE PEOPLE FROM DISEASE INFECTION , FINDING


THE PRESENCE DISEASE EARLIER AND QUICKER IS MANDATORY.
• TO DO THAT BUILDING A MACHINE LEARNING MODEL IS A EFFECTIVE
METHOD.
• A MACHINE LEARNING MODEL HELPS IDENTIFYING THE DISEASE AT A EARLY
STAGE.
ABSTRACT

• CHRONIC KIDNEY DISEASE (CKD) IS A GLOBAL HEALTH PROBLEM IT INDUCES


OTHER DISEASES.
• EARLY DETECTION OF CKD ENABLES PATIENTS TO RECEIVE TIMELY
TREATMENT TO STOP THE PROGRESSION OF THIS DISEASE.
• THE CKD DATA SET WAS OBTAINED FROM THE UNIVERSITY OF CALIFORNIA
IRVINE (UCI) MACHINE LEARNING REPOSITORY.
• MACHINE LEARNING ALGORITHMS AND NEURAL NETWORKS WILL BE USED
TO BUILD THE MODELS WITH HIGH ACCURACY
LITERATURE SURVEY
BASE PAPER
DOMAIN:MACHINE LEARNING
AUTHOR AND JOURNAL TITLE YEA METHODOLOGY DRAWBACKS
R

GUOZHEN CHEN , Prediction of Chronic 2020 Adaptive hybridized MISSING


CHENGUANG DING , YANG Kidney Disease Using Deep Convolutional VALUES ARE
LI , XIAOJUN HU , XIAO LI , Adaptive Hybridized Neural Network FILLED BY
LI REN Deep Convolutional (AHDCNN) MEDIAN
Neural Network on the METHOD
Internet of Medical CNN WHICH IS NOT
IEEE Things Platform PROMISING
Internet of medical things
platform (IoMT)
AUTHOR AND TITLE YEAR METHODS DRAWBACKS
JOURNAL
ALVARO SOBRINHO, Computer-Aided 2020 k-fold cross-validation ACCURACY IS
ANDRESSA C. M. DA S. Diagnosis of Chronic method based on the ONLY ABOUT
QUEIROZ , MARIA Kidney Disease in Weka software 95%
ELIETE PINHEIRO , AND Developing Countries: A
ANGELO PERKUSICH Comparative Analysis of J48 DECISION TREE LIMITED
Machine Learning VALUES IN
IEEE Techniques DATASET

ERLEND HODNELAND, Detection of CKD using 2018 Image registration DATASET IS


EIRIK KEILEGAVLEN, Tissue deformation fields method VERY SMALL
from dynamic MR
IEEE TRANSACTIONS imaging USES DYNAMIC
ON BIOMEDICAL MR IMAGING
ENGINEERING
AHMED Two Class Classification 2019 Two class Decision NO OF SAMPLES
AMIJOHARI,MOHD comparative experiments Forest && Two class IS LESS
HELMY Abd WAHAB for CKD Neural Networks
STAGE OF THE
IEEE International DISEASE CAN’T
Conference on Information BE POINTED
Systems and Computer
Networks
AUTHOR AND TITLE YEAR METHODS IMPROVEMENT
JOURNAL S
BILAL KHAN , RASHID An Empirical Evaluation 2020 Seven ML techniques Severity of the
NASEEM , FAZAL of Machine Learning together with NBTree, disease can’t be
MUHAMMAD , Techniques for Chronic J48 are used. predicted
GHULAM ABBAS, Kidney Disease Prophecy (MAE), (RMSE),
SUNGHWAN KIM (RAE), (RRSE),
recall
IEEE

Navaneeth Bhaskar and A Deep Learning-based 2019 CNN-SVM integrated Only one
Suchetha M System for Automated network Parameter is taken
Sensing of Chronic Kidney 1-D Deep Learning for prediction
Disease convolution network which decreases
IEEE Used Saliva Samples the prediction
capacity

Shubham Vashisth ,Ishika Chronic Kidney Disease 2020 Multi-Layer The dataset size
Dhall (CKD) Diagnosis using Perceptron Classifier can be increased
Multi- CMS data set is used Accuracy can be
International Conference Layer Perceptron increased
on Cloud Computing Classifier
AUTHOR AND TITLE YEAR METHODS IMPROVEMENT
JOURNAL S
N V Ganapathi Raju, Prediction of 2019 SVM, Random Forest, The missing values
K Prasanna Lakshmi, chronic XGBoost, Logistic can be filled using
K. Gayathri Praharshitha kidney Regression, Neural KNN.
Chittampalli Likhitha disease (CKD) networks, Naive Bayes
using Data Classifier. Accuracy can be
2019 ICICCS Science increased

Yedilkhan Amirgaliyev , Analysis of CKD using 2018 Used Support Vector Accuracy is only
Shahriar Shamiluulu, Machine Learning Machine(SVM) about 93%
Azamat Serek Techniques
An Integrated
IEEE Model can be
proposed
Gunarathne W.H.S.D, Performance Evaluation 2017 Uses Multi cast Accuracy is about
Perera K.D.M , on Machine Learning Decision Tree Classifier 99.1%
Kahandawaarachchi Classification
K.A.D.C.P Techniques and They reduced the
Forecasting through dataset size to 15
IEEE International Data Analytics for attributes
Conference on Chronic Kidney Disease
Bioinformatics and (CKD)
Bioengineering
AUTHOR AND TITLE YEAR METHODOLOGY IMPROVEMEN
JOURNAL TS

Ahmed J. Aljaaf, Dhiya Al- Early Prediction of Chronic 2018 The Classification and
No information
Jumeily, Hussein M. Kidney Disease Using Regression Tree.i.e. about any kind of
Haglan Machine Learning Supported RPART medications has
by Predictive Analytics been collected
Two Black box models with this data
IEEE Congress on SVM and MLP
Evolutionary Computation
(CEC)

Abdullah Al Imran, Md Classification of Chronic 2018 Logistic regression, They simply


Nur Amin , Fatema Tuj Kidney Disease using feedforward neural removed the
Johora Logistic Regression, networks and wide & datasets containing
Feedforward Neural Network deep learning to missing
International Conference and Wide & Deep Learning diagnose CKD values(KNN can
on Innovation in be used)
Engineering and
Technology (ICIET)
AUTHOR AND TITLE YEAR METHODOLOGY IMPROVEMEN
JOURNAL TS
Hanyu Zhang, Che-Lun Chronic Kidney Disease 2018 Artificial Neural They indicated
Hung, William Cheng- Survival Prediction with Network (ANN) that the dataset is
Chung Chu,Ping-Fang Artificial Neural Networks models while much
Chiu§ and Chuan Yi Tang applying to the imbalanced(KNN
survivability can be used)
IEEE International prediction on
Conference on Chronic Kidney
Bioinformatics and Disease (CKD)
Biomedicine patients.
K.Shankar,P. Manickam, G. Optimal Feature Selection 2018 Ant Lion Data mining
Devika,M. Ilayaraja for Chronic Kidney Optimization (ALO) procedures can be
Disease Classification technique to choose utilized as a part
using Deep Learning optimal features for of training with
IEEE International Classifier the classification enhancing
conference on process. execution of
computational and classifiers, and
computing research Deep Neural the datasets are
Network (DNN). expanded.
PROPOSED SYSTEM

• CKD IS A HEALTH PROBLEM WHICH HAS IMPACT ON ALL OVER THE WORLD
CAUSING MAJOR HEALTH PROBLEMS.
• THERE ARE NUMBER OF PROJECTS DONE ON THE BASIS OF CKD YET EACH HAS
THEIR OWN BENEFITS AND DRAWBACKS.
• OUR PROPOSED SYSTEM FOCUSES ON DEVELOPING A PREDICTIVE MODEL USING
MACHINE LEARNING LEARNING TECHNIQUES.
• KNN IS BEING USED FOR FILLING MISSING VALUES AND AN INTEGRATED MODEL IS
PROPOSED.
ARCHITECTURE OF THE SYSTEM

STAR CKD DATASET BUILDING THE


MODEL PREDICTION
T COLLECTION MODEL

DATA FEATURE
PREPROCESSING SELECTION
LIBRARIES
OF PYTHON
LIKE
SKLEARN CKD NOT CKD
MODULES EXPLANATION
DATASET

• THE CKD DATASET IS OBTAINED FROM THE UCI MACHINE LEARNING


REPOSITORY.
• THE DATA SET CONTAINS 400 SAMPLES.
• IN THIS CKD DATA SET, EACH SAMPLE HAS 24 PREDICTIVE VARIABLES OR
FEATURES (11 NUMERICAL VARIABLES AND 13 CATEGORICAL (NOMINAL)
VARIABLES) AND A CATEGORICAL RESPONSE VARIABLE (CLASS).
• EACH CLASS HAS TWO VALUES, NAMELY, CKD (SAMPLE WITH CKD) AND
NOTCKD (SAMPLE WITHOUT CKD).
• IN THE 400 SAMPLES, 250 SAMPLES BELONG TO THE CATEGORY OF CKD,
WHEREAS 150 SAMPLES BELONG TO THE CATEGORY OF NOTCKD.
• IT IS WORTH MENTIONING THAT THERE IS A LARGE NUMBER OF MISSING
VALUES IN THE DATA.
DATA PREPROCESSING

• THE DATASET CONSISTS OF MISSING VALUES.


• EACH CATEGORICAL (NOMINAL) VARIABLE WAS CODED TO FACILITATE THE
PROCESSING IN A COMPUTER.
• FOR EXAMPLE, THE VALUES OF RBC AND PC, NORMAL AND ABNORMAL
WERE CODED AS 1 AND 0, RESPECTIVELY.
• THERE IS A LARGE NUMBER OF MISSING VALUES IN THE DATA SET, AND THE
NUMBER OF COMPLETE INSTANCES IS 158 AND SO A CORRESPONDING
IMPUTATION METHOD IS NEEDED.
• KNN IMPUTATION IS USED TO FILL THESE MISSING VALUES AFTER ENCODING
THE CATEGORICAL VARIABLES.
FEATURE SELECTION

• EXTRACTING FEATURE VECTORS OR PREDICTORS COULD REMOVE


VARIABLES THAT ARE NEITHER USEFUL FOR PREDICTION NOR RELATED TO
RESPONSE VARIABLES.
• HEREIN , RANDOM FOREST(RF) TO EXTRACT THE VARIABLES THAT ARE MOST
MEANINGFUL TO THE PREDICTION.
• RF DETECTS THE CONTRIBUTION OF EACH VARIABLE TO THE REDUCTION IN
THE GINI INDEX.
• THE LARGER THE GINI INDEX, THE HIGHER THE UNCERTAINTY IN
CLASSIFYING THE SAMPLES.
• THEREFORE, WHEN THE RF WAS USED TO EXTRACT THE VARIABLES, ALL
VARIABLES WERE SELECTED EXPECT PC , PCC , BA , CAD , PE AND ANE.
MODEL BUILDING

• THE FOLLOWING MACHINE LEARNING MODELS HAVE BEEN OBTAINED BY


USING THE CORRESPONDING SUBSET OF FEATURES OR PREDICTORS ON THE
COMPLETE CKD DATA SETS FOR DIAGNOSING CKD.
1) REGRESSION-BASED MODEL: LOG
2) TREE-BASED MODEL: RF
3) DECISION PLANE-BASED MODEL: SVM
4) DISTANCE-BASED MODEL: KNN
5) PROBABILITY-BASED MODEL: NB
MODEL BUILDING

• 1) THE OUTPUT OF LOG WAS THE PROBABILITY THAT THE SAMPLE BELONGS
TO NOTCKD, AND THE THRESHOLD WAS SET TO 0.5.
• 2) RF WAS ESTABLISHED USING ALL VARIABLES.THE METHOD IS TO USE THE
DEFAULT 500 TREES. THE RF WAS ESTABLISHED USING THIS STRATEGY AND
EVALUATED ON THE DATA SETS OBTAINED BY KNN IMPUTATION.
• 3) THE MODELS OF SVM WERE GENERATED BY USING THE RBF KERNEL
FUNCTION.WHERE Γ WAS SET TO [0.1, 0.5, 1, 2, 3, 4]. PARAMETER C
REPRESENTS THE WEIGHT OF MISJUDGMENT LOSS, AND IT WAS SET TO [0.5, 1,
2, 3].
• 4) FOR THE NB, THE VALUE OF LAPLACE WAS EQUAL TO 1.
• 5) FOR THE KNN, THE NEAREST NEIGHBOR PARAMETER WAS SET TO [1, 3,
5, . . . , 19].
TECHNICAL EXPLANATION

• PYTHON IS USED DURING THE ENTIRE IMPLEMENTATION.THE


IMPLEMENTATION IS DONE IN JUPYTER NOTEBOOK.
• IN DATA PREPROCESSING ORDINAL ENCODER IS USED TO ENCODE ALL THE
CATEGORICAL VALUES TO NUMERICAL VALUES.
• THEN , KNN IMPUTER IS USED TO FILL IN THE MISSING VALUES WITH FIVE
DIFFERENT K-NEAREST NEIGHBOR VALUES SUCH AS 3,5,7,9,11.
• AFTER IMPUTATION, MINMAXSCALER FUNCTION IS USED TO NORMALIZE
THE VALUES IN THE DATAFRAME.
• ALL THESE LIBRARIES ARE IMPORTED FROM “ SKLEARN ” PYTHON PACKAGE.
TECHNICAL EXPLANATION

• AFTER PREPROCESSING , THE FEATURE SELECTION IS DONE USING RANDOM


FOREST REGRESSOR WITH 1000 ESTIMATORS.
• THEN SELECTFROM MODEL METHOD IS USED TO SELECT FEATURES WITH
THRESHOLD OF ABOVE 0.0001.
• NOW AS MENTIONED BEFORE USING THESE KNN IMPUTED DATASETS 5
MACHINE LEARNING MODELS ARE DEVELOPED.
• THE LIBRARIES FOR THESE MODELS ALSO IMPORTED FROM THE “ SKLEARN “
PYTHON PACKAGE.
OUTPUT
BEFORE DATA PREPROCESSING AFTER DATA PREPROCESSING
OUTPUT
FEATURE SELECTION
OUTPUT
MODELS ACCURACY
SOCIAL IMPACT

• TO IDENTIFY THE DISEASE AT EARLY STAGE WHICH HELPS THE


PEOPLE AND CLINICIANS TO DIAGNOSE THE DISEASE EASILY.
• IN THIS MODERN WORLD IT ALSO HELPS THE CLINICIANS TO
TREAT PATIENTS EASILY AND EFFECTIVELY.
• CHRONIC KIDNEY DISEASE (CKD) IS A GLOBAL PUBLIC HEALTH
PROBLEM AFFECTING APPROXIMATELY 10% OF THE WORLD’S
POPULATION.
• THIS WORK CERTAINLY HAS IMPACTS ON PEOPLE WHO ARE
SUFFERING FROM CKD.
CONCLUSION

• TO GIVE A MODEL THAT PRECISELY IDENTIFY THE CHRONIC KIDNEY DISEASE


PATIENTS ALONG WITH SEVERITY.
• WE USE THE DATASET TAKEN FROM UCI REPOSITORY.
• MACHINE LEARNING ALGORITHMS AND NEURAL NETWORKS WILL BE USED
TO BUILD THE MODEL.
• AN INTEGRATED MODEL COMPRISING OF SEVERAL ALGORITHM IS PROPOSED.
• THE INTEGRATED MODEL BUILDING MODULE WILL BE EXPLAINED IN THE
NEXT REVIEW.
REFERENCES
[1] Z. CHEN ET AL., “DIAGNOSIS OF PATIENTS WITH CHRONIC KIDNEY DISEASE BY USING TWO FUZZY
CLASSIFIERS,” CHEMOMETR. INTELL. LAB., VOL. 153, PP. 140-145, APR. 2016.
[2] A. SUBASI, E. ALICKOVIC, J. KEVRIC, “DIAGNOSIS OF CHRONIC KIDNEY DISEASE BY USING RANDOM
FOREST,” IN PROC. INT. CONF. MEDICAL AND BIOLOGICAL ENGINEERING, MAR. 2017, PP. 589-594.
[3] L. ZHANG ET AL., “PREVALENCE OF CHRONIC KIDNEY DISEASE IN CHINA: A CROSSSECTIONAL SURVEY,”
LANCET, VOL. 379, PP. 815-822, AUG. 2012.
[4] A. SINGH ET AL., “INCORPORATING TEMPORAL EHR DATA IN PREDICTIVE MODELS FOR RISK
STRATIFICATION OF RENAL FUNCTION DETERIORATION,” J. BIOMED. INFORM., VOL. 53, PP. 220-228, FEB. 2015.
[5] A. M. CUETO-MANZANO ET AL., “PREVALENCE OF CHRONIC KIDNEY DISEASE IN AN ADULT POPULATION,”
ARCH. MED. RES., VOL. 45, NO. 6, PP. 507-513, AUG. 2014.
[6] H. POLAT, H.D. MEHR, A. CETIN, “DIAGNOSIS OF CHRONIC KIDNEY DISEASE BASED ON SUPPORT VECTOR
MACHINE BY FEATURE SELECTION METHODS,” J. MED. SYST., VOL. 41, NO. 4, APR. 2017.
[7] C. BARBIERI ET AL., “A NEW MACHINE LEARNING APPROACH FOR PREDICTING THE RESPONSE TO ANEMIA
TREATMENT IN A LARGE COHORT OF END STAGE RENAL DISEASE PATIENTS UNDERGOING DIALYSIS,”
COMPUT. BIOL. MED., VOL. 61, PP. 56-61, JUN. 2015.
[8] V. PAPADEMETRIOU ET AL., “CHRONIC KIDNEY DISEASE, BASAL INSULIN GLARGINE, AND HEALTH
OUTCOMES IN PEOPLE WITH DYSGLYCEMIA: THE ORIGIN STUDY,” AM. J. MED., VOL. 130, NO. 12, DEC. 2017.
REFERENCES
[9] Z. CHEN, X. ZHANG, Z. ZHANG, “CLINICAL RISK ASSESSMENT OF PATIENTS WITH CHRONIC KIDNEY DISEASE BY
USING CLINICAL DATA AND MULTIVARIATE MODELS,” INT. UROL. NEPHROL., VOL. 48, NO. 12, PP. 2069-2075, DEC. 2016.
[10] A. J. ALJAAF ET AL., “EARLY PREDICTION OF CHRONIC KIDNEY DISEASE USING MACHINE LEARNING SUPPORTED
BY PREDICTIVE ANALYTICS,” IN PROC. IEEE CONGR. EVOLUTIONARY COMPUTATION, JUL. 2018.
[11] B. BOUKENZE, A. HAQIQ AND H. MOUSANNIF, “PREDICTING CHRONIC KIDNEY FAILURE DISEASE USING DATA
MINING TECHNIQUES,” IN PROC. INT. SYMP. UBIQUITOUS NETWORKING, NOV. 2016, PP. 701-712.
[12] N. ALMANSOUR ET AL., “NEURAL NETWORK AND SUPPORT VECTOR MACHINE FOR THE PREDICTION OF CHRONIC
KIDNEY DISEASE: A COMPARATIVE STUDY,” COMPUT. BIOL. MED., VOL. 109, PP. 101-111, JUN. 2019.
[13] W. H. S. D. GUNARATHNE, K. D. M. PERERA AND K. A. D. C. P. KAHANDAWAARACHCHI, “PERFORMANCE
EVALUATION ON MACHINE LEARNING CLASSIFICATION TECHNIQUES FOR DISEASE CLASSIFICATION AND
FORECASTING THROUGH DATA ANALYTICS FOR CHRONIC KIDNEY DISEASE (CKD),” IN PROC. IEEE 17TH INT. CONF.
BIOINFORMATICS AND BIOENGINEERING, OCT. 2017, PP. 291-296.
[14] D. DUA AND C. GRAFF, “UCI MACHINE LEARNING REPOSITORY,” IRVINE, UNIVERSITY OF CALIFORNIA, SCHOOL OF
INFORMATION AND COMPUTER SCIENCES, 2017. [ONLINE]. AVAILABLE: HTTP://ARCHIVE.ICS.UCI.EDU/ML.
[15] D. ICHIKAWA ET AL., “HOW CAN MACHINE-LEARNING METHODS ASSIST IN VIRTUAL SCREENING FOR
HYPERURICEMIA? A HEALTHCARE MACHINE-LEARNING APPROACH,” J. BIOMED. INFORM., VOL. 64, PP. 20-24, DEC. 2016.
[16] L. N. SANCHEZ-PINTO, L. R. VENABLE, J. FAHRENBACH, M. M. CHURPEK, “COMPARISON OF VARIABLE SELECTION
METHODS FOR CLINICAL PREDICTIVE MODELING,” INT. J. MED. INFORM., VOL. 116, PP. 10-17, AUG. 2018.
THANK YOU!!

You might also like