Professional Documents
Culture Documents
Final Report
Final Report
Chapter 1
INTRODUCTION
Data Mining is one of the most encouraging areas of research with the purpose of
finding useful information from voluminous data sets. It has been used in many domains
like image mining, opinion mining, web mining, text mining, graph mining etc. Its
applications include anomaly detection, financial data analysis, medical data analysis,
social network analysis, market analysis etc. It has become popular in health organization
as there is a requirement of analytical methodology for predicting and finding unknown
patterns and information in health data. It plays a vital role for discovering new trends in
healthcare industry. Data Mining is particularly useful in medical field when no
availability of evidence favoring a particular treatment option is found. Large amount of
complex data is being generated by healthcare industry about patients, diseases, hospitals,
medical equipments, claims, treatment cost etc. that requires processing and analysis for
knowledge extraction.
Data mining comes up with a set of tools and techniques which when applied to this
processed data, provides knowledge to healthcare professionals for making appropriate
decisions and enhancing the performance of patient management tasks. Patients with
similar health issues can be grouped together and effective treatment plans could be
suggested based on patient’s history, physical examination, diagnosis and previous
treatment patterns. Chronic Kidney Disease (CKD) has become a global health issue and
is an area of concern. It is a condition where kidneys become damaged and cannot filter
toxic wastes in the body. Our work predominantly focuses on detecting life threatening
diseases like Chronic Kidney Disease (CKD) using Classification algorithms like Naive
Bayes and Artificial Neural Network (ANN).
1.1 Overview
Chronic Kidney Disease (CKD) is the highly developed and irreversible destruction
of the kidneys. Kidneys are indispensable parts of human body. They have several
functions, including:
Helping confirm the credit of minerals and electrolytes in your body, such as
calcium, sodium, and potassium
Chronic Kidney Disease (CKD), then called chronic renal disease, is once loss in
kidney act on intensity of times of months or years. The symptoms of worsening kidney
performance are not specific, and might add happening feeling generally unwell and
experiencing a condensed appetite. Often, chronic kidney sickness is diagnosed for that
excuse of screening of people known to be at risk of kidney problems, such as those
considering high blood pressure or diabetes and those as well as a bloodline relative
following CKD. This weakness may along with be identified once than it leads to one of
its qualified complications, such as cardiovascular weakness, anemia, pericarditis or renal
osteodystrophy.CKD is a long-term form of kidney sickness; as a result, it is
differentiated from acute kidney sickness (acute kidney cause offense) in that the
narrowing in kidney discharge adherence must be faculty for as soon as then again 3
months. CKD is an internationally credited public health is not a hundred percent
affecting 5-10% of the world population.
1.2 Motivation
The present lifestyles of people, working environment and diet have given rise to
many diseases, one of which includes Chronic Kidney Disease. Chronic Kidney Disease
(CKD) is prevailing nowadays and has become a global health issue which must be
timely detected and diagnosed.
Kidneys are important organs of human body that eradicate toxic and unwanted
waste from blood causing smooth functioning of body organs. CKD is a condition that
describes loss of kidney function over time making it difficult for them to filter poisonous
wastes from the body.
Using of Data mining technique helps us to use different set of tools for calculating
large set of data volumes. Helping to find out chronic disease in human body also
approaches to find other symptoms that can be cured with the help of diagnosis.
Chronic Kidney Disease (CKD) has become a global health issue and is an area of
concern. It is a condition where kidneys become damaged and cannot filter toxic wastes
in the body. Our work predominantly focuses on detecting life threatening diseases like
Chronic Kidney Disease (CKD) using Classification algorithms. Proposed system is
automation for Chronic Kidney Disease prediction using classification technique “naive
bayes” and artificial neural network technique “C4.5”.
Chronic Kidney Disease has been predicted and diagnosed using data mining
classifiers: ANN and Naive Bayes. Performances of these algorithms are compared using
Rapid miner tool. The obtained results showed that Naive Bayes is the most accurate
classifier with higher accuracy when compared to ANN output’s accuracy. In this system,
some of the factors considered were age, diabetes, blood pressure, RBC count etc. The
work can be extended by considering other parameters like food type, working
environment, living conditions, availability of clean water, environmental factors etc for
kidney disease detection.
Initially dataset are collected from various source and stored in the database. Dataset
consist of 24 attributes and each attributes has its own range. In the next step
preprocessing takes place in that, noisy and irrelevant data are removed. Preprocessed
data are sent to the classification algorithm and data are classified based on Naive Bayes
and ANN algorithm as shown in the fig1.1.
Data
collection
Preprocessing
Classification Algorithm
(Naïve Bayes & Neural Network)
Test Data
Model
Prediction
Classified old patient dataset and new patient test result are given to the model there
it will be arranged in the structured manner. In analysis phase both results will be
analyzed then the result will be predicted.
For example the clinical data of 200 records considered for analysis has been taken
from UCI Machine Learning Repository. The data obtained after cleaning and removing
missing
Values are 220. The data has been implemented using Rapid Miner tool. There are 25
attributes in the dataset. The numerical attributes include age, blood pressure, blood
glucose random, blood urea, serum creatinine, sodium, potassium, hemoglobin, packaged
cell volume, WBC count, RBC count. The nominal attributes include specific gravity,
albumin, sugar, RBC, pus cell, pus cell clumps, bacteria, hypertension, diabetes mellitus,
coronary artery disease, appetite, pedal edema, anemia and class.
The validation process which helps to examine the accuracy of fitted models and its
performance on new data and also model construction helps to build a model and testing
dataset and measures its performance.
Chapter 2
LITERATURE SURVEY
2.1 Survey Papers
[1] Performance analysis of classification data mining techniques over heart disease
data base.
The healthcare industry collects huge amounts of healthcare data which,
unfortunately, are not “mined” to discover hidden information for effective decision
making. Discovery of hidden patterns and relationships often goes unexploited. Advanced
data mining techniques can help remedy this situation.
It can serve a training tool to train nurses and medical students to diagnose
patients with heart disease. It is a web based user friendly system and can be used in
hospitals if they have a data ware house for their hospital. Presently we are analyzing the
performances of the two classification data mining techniques by using various
performance measures.
The effectiveness of models was tested using two methods: Classification Matrix
and Lift Chart. This system can serve a training tool to train nurses and medical students
to diagnose patients with heart disease. It can also provide decision support to assist
doctors to make better clinical decisions or at least provide a “second opinion.”
[2] Mining Medical Data to Identify Frequent Diseases using Apriori Algorithm.
The data mining is a process of analyzing a huge data from different perspectives
and summarizing it into useful information. The information can be converted into
knowledge about historical patterns and future trends. Patients from different locations
approach different hospitals.
They do not converge in a same place. Their records are maintained by the
hospitals where they get treated. Collecting information about the frequently occurring
diseases is not an easy job. The data collection regarding these sorts of diseases can be
done through association rule. Apriori of the Association rule is adopted for the mining of
data. Details regarding the occurrence of these diseases in a particular time period can
also be mined using Apriori algorithm.
The proposed method is useful to identify the frequent diseases in a large medical
dataset. The outcome of this research will help the practitioners in making medicinal
decisions for frequently occurring diseases. Analysis is made on data from various
geographical locations during different time periods.
cluster similarity is low. Cluster similarity is measured in regard to the mean value of the
objects in a cluster. First, it randomly selects k of the objects, each of which initially
represents a cluster mean or centre. For each of the remaining objects, an object is
assigned to the cluster to which it is the most similar, based on the distance between the
object and the cluster mean. It then computes the new mean for each cluster. This process
iterates until the criterion function converges. This clustering procedure is applied to the
parameters and the survival period of patient is identified. The clustering is applied based
on the age and gender. The parameters with the normal value have better survival rate
than low or high values.
[4] An empirical study on prediction of heart disease using classification data mining
techniques.
The use of pattern recognition and data mining techniques into risk prediction
models in the clinical domain of cardiovascular medicine is proposed. The data is to be
modeled and classified by using classification data mining technique. Some of the
limitations of the conventional medical scoring systems are that there is a presence of
intrinsic linear combinations of variables in the input set and hence they are not adept at
modeling nonlinear complex interactions in medical domains. This limitation is handled
in this research by use of classification models which can implicitly detect complex
nonlinear relationships between dependent and independent variables as well as the
ability to detect all possible interactions between predictor variables.
disease processes are at work, generating a complex sequence of abnormal findings that
can be interpreted in a variety of ways.
Also proposes a data discovery algorithm for a small data set with high
dimensionality. Support vector machine is applied to classify the feature vectors. Finally,
particle swarm agents are used to discover the SVM classification rules. It has been
shown that this algorithm can manage the rule extraction task efficiently. We will develop
and evaluate a new approach for interactive data mining based on swarm intelligence. The
proposed method will process external rules along with the raw data to do reasoning. The
proposed method is designed to work in low sample and high dimensional feature space
conditions where statistical power of the raw data is not sufficient for a reliable decision.
designed in order to achieve more better and efficient results. Finally the calculation of
the optimized results is done and the comparison is performed. The comparison will show
the best results.
Lung cancer, a disease highly dependent on historical data for early diagnosis, has
influenced researchers to pursue the data mining techniques for the pre-diagnosis process.
The five year survival rate increases to 70% with the early detection at stage 1, when the
tumor has not yet spread. Existing medical techniques like X-Ray, Computed
Tomography (CT) scan, sputum cytology analysis and other imaging techniques not only
require complex equipment and high cost but is also proven to be efficient only in stage 4,
when the tumor has metastasized to other parts of the body. The proposed system involves
the development of a data mining tool that will help in the classification of patients into
the category that could potentially test positive for lung cancer in stage 1. Based on the
pre-diagnosis results from the tool, the doctor can perform the diagnosis for the
confirmation of tumor in the patient and initiate the treatment at an early stage thereby
increasing the survival rate.
The method of applying data mining techniques in identifying effective pre-diagnosis
of the disease can improve practitioner performance. Lung cancer being a disease which
is highly dependent on historical data can make use of data mining for its early detection.
Researchers have been investigating on applying various data mining techniques on lung
cancer dataset for early diagnosis of lung cancer. This paper proposes a model for
measuring if applying data mining techniques to lung cancer dataset can provide reliable
performance in the detection of lung cancer at Stage I .The proposed system uses the most
effective method to extract knowledge and information from the existing lung cancer
profile data. Data cleaning is a challenging step involved here as the data collected from
heterogeneous sources does not contain all the required attributes. Normally with increase
in the training data, performance can be increased.
age of the patient next is spectacle prescription which describes the type of spectacle the
patient is using and the last is astigmatism which is a type of an eye defect. Based on
these the output class will provide the type of lens recommended by the doctor.
choice. In a fuzzy inference system or FIS, fuzzy set theory is applied to map
inputs (or attributes) to outputs. The fuzzification process involves transforming crisp
values into various grades of membership for linguistic terms of fuzzy sets. Membership
functions are used to associate a grade to each linguistic term. De-fuzzification is the
process of getting a quantifiable result in fuzzy logic, given the fuzzy sets and
corresponding membership degrees (obtained from fuzzification).The designed fuzzy
expert input variables, can successful precise and accurate. Comparing with traditional
approaches used by hospitals, the system can predict the healthiness of kidney.
Data mining tools have been developed for effective analysis of medical
information to help the clinician in making better diagnosis. In this research work, the
researcher can collect data from Hospital Information System (HIS)which has the
sufficient details of patient including patient’s name, age, disease, location, district, date
from laboratories which keeps on growing year after year. Having collected the data from
hospital information system, this research can find the frequent disease with the help of
association techniques. This research work helps to mine the data about the frequent
diseases with help of tools applied over training data set. [2]
The dataset have the large volume of data which consumes more time for
classification. Thereby reduction the dimensionality of data using the attribute selection
Both of the injection and rejection of rules to allow interactive and effective
contributions provided by an expert user. The well-known support vector machine (SVM)
classifier and swarm data miner will be integrated to handle joint processing of the raw
data and the rules. [5]
The effectiveness of models was tested using different data mining methods. The
purpose is to determine which model gave the highest percentage of correct predictions
for diagnosing patients with a major life threatening diseases. The purpose of this study is
to investigate the use of different classifiers as tools for data mining, predictive modeling
and data processing in the prognosis of diseases. The goal of any modeling exercise or the
best technique is to extract as much information as possible from available data and
provide an accurate representation of both the knowledge and uncertainty about the
epidemic. The prediction of life threatening diseases survivability has been a challenging
research problem for many researchers. Since the early dates of the related research,
much advancement has been recorded in several related fields [8].
databases. Data mining is also stated as essential process where intelligent methods are
applied in order to extract the data patterns. The rule extraction is the basic process of
data mining. If-then rules are the most common taxonomy for the rule extraction in the
field of extracting knowledge from a large database. To obtain the best possible solution
in the extraction [9].
The delivery of precise an elaborative diagnosis of disease sis important and crucial
for the well-being of patients. Conventional diagnosis systems for renal diseases today
involve taking several tests that include tests on blood sugar, BUN (Blood Urea
Nitrogen), creatinine. Developing a system which can be used by doctors for a more
precise analysis of kidney condition. Diagnosing kidney condition today is vital attribute
in almost all medical fields [10].
Chapter 3
SOFTWARE REQUIREMENT SPECIFICATIONS
3.1 Introduction
Software Requirement Specification (SRS) is a fundamental document, which
forms the foundation of the software development process. SRS not only lists the
requirements of a system but also has a description of its major features. These
recommendations extend the IEEE standards. The recommendations would form the basis
for providing clear visibility of the product to be developed serving as baseline for
execution of a contract between client and the developer.
SRS constitutes the agreement between clients and developers regarding the
contents of the software product that is going to be developed. SRS should accurately and
completely represent the system requirements as it makes a huge contribution to the
overall project plan.
The software being developed may be a part of the overall larger system or may
be a complete standalone system in its own right. If the software is a system component,
the SRS should state the interfaces between the system and software portion.
3.2 Stakeholders
The Stakeholders of the project are:
Team members
Project guide
Project reviewers
Department Faculties
College management
Organization’s officials
Admin
Doctor
Patient
Receptionist
Module 1: Admin
In this module, Admin will keep track of day-to-day process done in the
application.
Staff Creation: Responsible for creating staff account and managing their
activity.
Add Stage Data: Admin will add stages into the database.
Constraints: Admin will add patient constraints into the database.
Ranges: Admin will add set of predefined ranges into constraints.
Password: Admin can reset password of his own or he can also create staff
password.
Update: Admin can upload any additional data of staff, stages, ranges and
constraints into the database.
Delete: Admin will be enabling to delete staff, stage, range and constraints from
database.
Module 2: Doctor
Doctor can able to detect whether patient contains CKD or not.
Upload Patient Details: Doctor can upload patient clinical data and in future he
can modify data record set. It helps to monitor patient clinical record.
View Treatment Details: After uploading treatment details, they can also view it.
Module 4: Receptionist
Receptionist can register, upload, view and generate billing details of patient.
Register: They can register patient information by creating new account.
Upload of data: Receptionist can upload patient’s clinical data.
Billing Details: They can generate billing details of each patient data.
View Patient Details: Receptionist can view each patient information and they
can alter the changes whenever they needed.
Re-usability: The system is a web based application, once the user creates an
account; user can access the system multiple times.
3.5 System Requirements
3.5.1 Software Requirements
Framework: .NET
IDE: Visual Studio 2010
Front End: ASP. NET 4.0
Programming language: C#NET
Chapter 4
SYSTEM ANALYSIS AND DESIGN
4.1 System Analysis
System Analysis is a detailed analysis of various operations performed by a system
and their relationship within and outside the system. It is systematic technique that
defines goals objectives. One of the main aspects of analysis is the defining the
boundaries of the system.
System analysis study has been conducted with the following objectives.
Identify user’s need.
Evaluate the system concept for feasibility.
Perform economical and technical analysis.
Allocate function to hardware, software, people, databases etc.
The diagrams are specified in a precise, concise and highly readable manner. It
shows the working system and how it interacts together.
Application-tier: Also called middle-tier, Logic tier, Business Logic tier, this tier
is pulled from the presentation tier. It controls application functionality by
performing detailed processing.
Data-tier: Houses database servers where information is stored and retrieved.
Data in this tier is kept independent of application servers or business logic.
Data mining technique is used in the building of our architecture where data is very
important key factor and used all times in the system. All the related data and their
information are stored in the DB accurately. Database is maintained for each module
where it stores data values and all functionalities work accordingly. Algorithm
implementation is carried out for each and every step of module to predict the result
within range and constraints that are given in the time of inserting values to the record.
The main goal of the system is to detect CKD in a patient by taking risk factors and
different attribute values using Naive Baye’s algorithm. Four modules are used in the
system where each module has different functionality and operations. All functions
defined within system and works by inputting a value and fetching the result from the
database.
Login
Stages
Staff
Constraint
ADMIN
Range
Set ID/Password
Change
Password
Doctor:
Upload
Result
DOCTOR
Change
Password
Update/ delete
Receptionist:
Register
Receptionist
Billing
Upload
Change Password
Patient:
Feedback
View
treatment
Details
PATIENT
Admin:
ID and Password
Success Create Staff
View Staff
Input Stages
View Stages
Upload Password
Logout
Doctor:
Success
Upload new patient constraint
ID and Password
Failed
View Result
Logout
Patient:
Upload Feedback
ID and Password
Success
View Treatment Details
Failed
Logout
Receptionist:
ID/Password
Upload old patient detail
Success
Failed
Register patient details
Logout
In our system, workflow of each module are explained and detailed description of
them are elaborated using activity diagram with relevant symbols and notations.Each
module consist of one input, one ouput and individual functions are defined within the
system.
Admin : Each module has login, using the credentials given they can see the data
and view their related information.If login doesn’t match with the password invalid
message was shown. Fig shows activity diagram of admin module.When admin login
successfully, they can give data related to staff member , stages, constraints and range of
the patient test record.They have permisson to change their own password.
Receptionist : As shown in the Fig If receptionist login to his/her page into the
system, they functions with registration, uploading of patient information,generating
billing details and also looking details of treatment that are given by doctor. They are
responsible for managing account of each patient.Error message was displayed if they
enter wrong login details.
Patient : In this module when patient login to his/her page they can view treatment
details which consist of stages in which patient suffering from disease, symptoms,medical
prescription given by doctor. As shown in the Fig , they are provided with feedback field
where they give their thoughts and discuss their doubts and issues.
Doctor : Using login details they are given privelege to upload patient test records,
giving constraints and range depending on the initial stage of CKD detection.Once patient
treatment begins they can keep track of improvement, adding range of attributes to their
own database.Range can be selected by doctor for each patient which helps them to detect
disease.
Admin :
Login
Invalid
Valid
IS Admin
Receptionist:
Login
In Valid
Valid
Is Receptionist
Patient :
Login
Invalid
Valid
Patient
Doctor :
Login
Invalid
Valid
Is Doctor
Data Flow Diagram does not show about the timing of the operation and information
of the process where it undergoes in the system.DFD shows information in the form of
visual display will be input to and output from the system like where the data come from
and go to, and where the data will be stored.
DFD uses set of symbols like rectangles, circles and arrows, plus short text labels, to
show data inputs, outputs, storage points and the routes between each destination. With
the help of data flow diagram, users are able to visualize how the system will operate,
what the system will accomplish, and how the system will be implemented.
In our project we have four individual modules which represent visual display of the
system and we have explained each functionality of the separate module and how it builds
relationship between the modules according to the flow.
All the above modules are interrelated to each other through a set of process defined
in the structural system analysis using defined symbols and notations. Each module has at
least one input for one output. Information of data used in the system is stored in the
database separately. As a result by giving relevant set of input we can fetch the output in
the form of data.
Admin Module:
ADMIN operates various kinds of roles by supporting people or group of people in
business enterprise. They manage more routines administration tasks within an
organization or department.
In our project admin plays an important role by handling various tasks within a
system. After getting access to the page, admin monitors different tasks. It keeps track of
patient’s detail by inputting different stages and constraints.
Initially, this module inputs different attributes like inputting patient record,
updating/deleting the record in the database, monitoring and viewing the patient test
record and so on. Admin can set range to the constraints and also he alters the range when
it is necessary. He can modify the system by adding or removing constraints.
ID/Password
Admin Login
Id/pwd
Constraints
Constraints Constraints
Range Range
Range
DATABASE
Admin has given permission to add staff in the database by creating User
ID/password to them where they can monitor staff behavior. Adding staff information
helps to keep track of which staff member is taking responsibility of patient so that time
consuming can be avoided at the time of patient’s evaluation.
If any update happened in the patient’s record, they have given authority to change
the dataset in the DB of patient. Deletion of dataset can be done by admin at the time of
any duplicate entries in the database. They are authenticated to monitor behavior of
patient’s data in finding the symptoms for the cause of Chronic Kidney Disease (CKD).
Once data are uploaded in the database he/she are eligible to view the patient record.
Also they have access to look past record set of patient which improves communication
with the doctor very easy and accordingly prescription was given depending on the
health-checkups.
Receptionist Module:
ID/Password
Receptionist Login
Billing Password
Billing Billing
Details Details Details
Patient
Patient Patient
registration
registration registration
DATABASE
This module contains patient previous clinical data, patient registration, patient
history and billing detail.
When a patient visits clinic it is the job of receptionist to register his/her account in
the clinic database. If user is already registered, then no need to create new account in the
DB. This module has access to view previous history of patient data record set.
Patient Module:
ID/Password
Patient Login
DATABASE
For a patient three attributes are given such as feedback, viewing treatment details
and changing password. Initially, He/she are registered with User ID/password. Once
patient login into the account, they can view their record details which contains
information of his/her treatment detail.
In this module they are authorized to change password so that data can be
secured. Only they can login to their account and provide feedback depending on clinic
behavior and how they feel about the treatment given in the clinic, care taken by staff
members. Feedback helps the clinic and staff members to correct themselves in the future
so that they can take care of patients smoothly and softly.
Doctor Module:
ID/Password
Doctor Login
Treatment
Patient Data Data
New Patient
Data
Treatment Password
Data
Treatment
New Patient
Data Stages Data
Result
DATABASE
In this module, the doctor uploads new patient’s clinical data, patient’s treatment
detail, and based on test result he can decide whether patient contain CKD or not.
Doctor can add new patient’s constraints to the dataset. He can also see patient
record, treatment details and depending on the test result system can predict stages as
well as patient is CKD or not.
.
User Type
: id Password
Login n
1 n
n
Up
R loa
Mo eg Mo d
Stage 3 dify ist dify Seriu
er n m
n
tname
Stage 2 n n Age
: tid
1
Patient
Stages H n Treatment Constraints
Stage 1 Sugar
as
1 n
1
n
Co
H Patients nt
as
ai
ns
Email pname
: pid
An Entity-Relationship Model (ER Model) is a data model for describing the data or
information aspects of a business domain or its process requirements, in the abstract way
that leads itself to ultimately being implemented in a database such as relational database.
The main components of ER modules are entities and the relationships that can exist
among them.
An Entity-Relationship Model is a systematic way of describing and defining a
business process. The process is modeled as components (entities) that are linked with
each other by relationships that express the dependencies and requirements between
them .
Table 1 – Users
Table 3 – Stages
Table 6 – Patients
Table 8 – Treatment
CHAPTER 5
IMPLEMENTATION
. Implementation can be described as realization of an application, or execution of a
plan, idea, model, design, specification, standard, algorithm, or policy. In computer
science, an implementation is explained as realization of a technical specification or
algorithm as a program, a software component, or any other computer system through
computer programming and deployment. Many implementations may exist for a given
specification or standard.
enforcing strict type safety and other forms of code accuracy that ensure security and
robustness. In fact, the concept of code management is a fundamental principle of the
runtime. Code that targets the runtime is known as managed code, while code that does
not target the runtime is known as unmanaged code.
The class library, the other main component of the .NET Framework, is a
comprehensive, object-oriented collection of reusable types that you can use to develop
applications ranging from traditional command-line or graphical user interface (GUI)
applications to applications based on the latest innovations provided by ASP.NET,
such as Web Forms and XML Web services.
1. Data Set
The dataset is a disconnected, in-memory representation of data. It can be
considered as a local copy of the relevant portions of the database. The Data Set is
persisted in memory and the data in it can be manipulated and updated independent of the
database. When the use of this Data Set is finished, changes can be made back to the
central database for updating. The data in Data Set can be loaded from any valid data
source like Microsoft SQL server database, an Oracle database or from a Microsoft
Access database.
2. Data Provider
The Data Provider is responsible for providing and maintaining the connection to
the database. A Data Provider is a set of related components that work together to
provide data in an efficient and performance driven manner. The .NET Framework
currently comes with two Data Providers: the SQL Data Provider which is designed only
to work with Microsoft's SQL Server 7.0 or later and the OleDb Data Provider which
allows us to connect to other types of databases like Access and Oracle. Each Data
Query Analyzer offers a quick and dirty method for performing queries against
any of your SQL Server databases. It's a great way to quickly pull information out of a
database in response to a user request, test queries before implementing them in other
applications, create/modify stored procedures and execute administrative tasks.
SQL Profiler provides a window into the inner workings of your database. You
can monitor many different event types and observe database performance in real time.
SQL Profiler allows you to capture and replay system "traces" that log various activities.
It's a great tool for optimizing databases with performance issues or troubleshooting
particular problems.
Service Manager is used to control the MSSQL Server (the main SQL Server
process), MSDTC (Microsoft Distributed Transaction Coordinator) and SQL ServerAgent
processes. An icon for this service normally resides in the system tray of machines
running SQL Server. You can use Service Manager to start, stop or pause any one of
these services.
Training Dataset
Anil X A P CKD
Ajay X B Q CKD
Kumar Z A R CKD
New Patient data – Akash Constraints (S1 -X, S2-A, S3-R) Disease – CKD / NOT
CKD
P= [n_c + (m*p)]/ (n+m)
CKD NOT CKD
X X
P=[n_c + (m*p)]/(n+m) P=[n_c + (m*p)]/(n+m)
n=2, n_c=2,m=3,p=0.5 n=2, n_c=0,m=3,p=0.5
p=[2+(3*0.5)]/(2+3) p=[0+(3*0.5)]/(2+3)
p=0.7 p=0.3
A A
P=[n_c + (m*p)]/(n+m) P=[n_c + (m*p)]/(n+m)
n=2, n_c=2,m=3,p=0.5 n=2, n_c=2,m=3,p=0.5
p=[2+(3*0.5)]/(2+3) p=[2+(3*0.5)]/(2+3)
p=0.7 p=0.3
R R
P=[n_c + (m*p)]/(n+m) P=[n_c + (m*p)]/(n+m)
n=2, n_c=1,m=3,p=0.5 n=2, n_c=1,m=3,p=0.5
p=[1+(3*0.5)]/(2+3) p=[1+(3*0.5)]/(2+3)
p=0.5 p=0.5
CKD – 0.7 * 0.7 * 0.5 * 0.5 (p) NOT CKD – 0.3 * 0.3 * 0.5 * 0.5 (p)
=0.1225 =0.0225
C4.5 is one among the top algorithms in data mining technique. It was developed by
Ross Quinlan. In the projectC4.5 algorithm has been implemented to predict the stages of
CKD of the patients based on clinical test constraints.
Step 4: Create a decision node based on a_best – retrieval of nodes [patient] where the
attribute values matches with a_best.
Step 5: recur on the sub-lists [list of patient] and calculate the count of outcomes [Stages]
– termed as sub nodes. Based on the highest count we classify the new node.
Sample Example
Training Dataset
Anil X A P S1
Kumar X B Q S1
Ajay Y B P S2
Naveen Z A R S1
Akash Z A Q S2
Sort ();
Feature Count
A 3
X 2
R 1
Output
Stage Priority
S1 2
S2 1
main ()
{
LOGIN ();
Admin ();
Receptionist ();
Doctor ();
Patient ();
}
LOGIN ()
{
GET User_ type;
GET User_ ID/Email_ Id;
GET Password;
If (User_ ID==entered User _ID and Password==entered Password)
{
User _type is fetched from Database
LOGIN SUCCESSFUL
}
If (User_ type==1)
Admin
Else If (User_ type==2)
Receptionist
Else If (User_ type==3)
Doctor
Else If (User_ type==4)
Patient
}
LOGIN FAILED
}
Admin()
{
add_staff();
stages();
constraints ();
values ();
account ();
}
add_staff()
{
GET User_type;
GET User_Id;
GET password;
GET Email_Id;
if(User_Id==entered User_Id)
User_Id already exist
else
staff is added
}
Stages ()
{
GET Stage;
If (Stage==entered stage)
Stage exists
Else
Stage is added
}
Constraint ()
{
GET Constraint;
if(Constraint==entered constraint)
Doctor ()
{
Upload_patientdetails();
View_patientdetails();
Result();
Treatment_details();
}
Upload_patientdetails()
{
Get patient_details();
Add_constraints();
}
Add_constraints()
{
Get patient_constraints();
}
View_patientdetails()
{
Display_patientdetails();
}
Result()
{
If(result==CKD)
{
Patient has CKD;
If(result==stage1)
Patient has stage1;
Else If(result==stage2)
Patient has stage2;
Else If(result==stage3)
Patient has stage3;
else If(result==stage4)
Patient has stage4;
else
Patient has stage5;
}
Else
Patient does not have CKD;
}
Treatment_details ()
{
If(stage==stage1)
Display treatment_details_of_stage1;
Else if(stage==stage2)
Display treatment_details_of_stage2;
Else If(stage==stage3)
Display treatment_details_of_stage3;
Else If(stage==stage4)
Display treatment_details_of_stage4;
Else
Display treatment_details_of_stage5;
}
Account()
{
GET old_password;
GET new_password;
GET confirm_password;
if(old_password==existing password && new_password==confirm_password)
password changed successfully
else
unsuccessful
}
Receptionist()
{
Upload_patientdetails();
Billing();
Account();
}
Upload_patientdetails()
{
Get patient_details;
Add_constraints();
}
Add_constraints()
{
get patient_constrains;
}
Account()
{
GET old_password;
GET new_password;
GET confirm_password;
if(old_password==existing password && new_password==confirm_password)
password changed successfully
else
unsuccessful
}
Patient ( )
View_treatmentdetails( );
Give_feedback();
Account();
View_treatmentdetails()
Get treatment_details;
Give_feedback()
Upload_feedback;
Account()
{
GET old_password;
GET new_password;
GET confirm_password;
if(old_password==existing password && new_password==confirm_password)
password changed successfully
else
unsuccessful
}
5.8 Advantages
We have developed and performed an internal validation for five models for CKD
progression from stage I to stage V. Our models leverage different types of
variables—demographic, laboratory and/or clinical documentation data that are
collected routinely during the course of clinical care as part of the electrical health
record —as well as the longitudinal aspect of the records as encoded through
filters.
We found that text is a valuable predictor for CKD progression and that the use of
time series models to characterize patient state can substantially improve
predictive accuracy for progression. In particular, the model which incorporated
demographic, laboratory, and clinical documentation data had the highest
concordance of the models considered.
Risk prediction in CKD has been studied extensively, with dozens of available
risk models with acceptable performance (discrimination 0.56–0.94). Most
developed classifiers use readily obtainable information, including age,
demographics, and laboratory data. Hence, laboratory data, comorbidities, and
occasional vital signs are the sole dimensions of contemporary CKD classifiers.
Age, sex, and eGFR are included in almost all models, but fewer than half use
proteinuria (qualitative assessment or quantitative proteinuria or albuminuria),
serum creatinine, serum albumin, or blood pressure.
5.9 Limitation
The models we designed and validated are based on data from a single institution.
While there is value in focusing on a single institution at a time (the risk predictions
are relevant to the characteristics of the institution’s patient population for instance),
the model validity and its generalizability would be better demonstrated over data
from several institutions.
Short of training a model for data from different institutions, the models presented in
this study are in theory portable to different institutions. In particular, the
unsupervised NLP techniques described here (topic modeling) are actually conducive
to such an approach, as they identify patterns in the language of any given corpus
without any prior knowledge of the topics or vocabulary to expect. To address the
potential differences in language from one institution to another, the topic models
would have to be learned on documentation from the new institutions.
Chapter 6
TESTING
Testing is the process of evaluating a system or its component(s) with the intent to
find that whether it satisfies the specified requirements or not. This activity results in the
actual, expected and difference between their results. In simple words testing is executing
a system in order to identify any gaps, errors or missing requirements in contrary to the
actual desire or requirements.
Testing is the practice of making objective judgments regarding the extent to
which the system (device) meets, exceeds or fails to meet stated objectives.
The technique of testing without having any knowledge of the interior workings of
the application is Black Box testing. The tester is obvious to the system architecture and
does not have access to the source code. Typically, when performing a black box test, a
tester will interact with the system's user interface by providing inputs and examining
outputs without knowing how and where the inputs are worked upon.
White box testing is the detailed investigation of internal logic and structure of the
code. White box testing is also called glass testing or open box testing. In order to
perform white box testing on an application, the tester needs to possess knowledge of the
internal working of the code. The tester needs to have a look inside the source code and
find out which unit/chunk of the code is behaving inappropriately.
Registered User
Logs in to system Email=abc@ Has logged in as Admin
TC009 by entering email gmail.com and navigate to admin Successful login Pass
and password(if Password=**** homepage
user is admin)
Registered User
Logs in to system Email=abc@ Has logged in as Doctor
TC010 by entering email gmail.com and navigate to doctor Successful login Pass
and password(if Password=**** homepage
user is doctor)
Registered User
Logs in to system
Email=abc@ Has logged in as
by entering email
TC011 gmail.com Receptionist and navigate Successful login Pass
and password(if
Password=**** to receptionist homepage
user is
receptionist)
Registered User
Logs in to system Email=”abc@ Has logged in as Patient
TC012 by entering email gmail.com” and navigate to patient Successful login Pass
and password(if Password=**** homepage
user is patient)
Click on add
View add Add constraints page has Add constraints page error message
TC021 constraints link from fail
constraints to be displayed didn’t displayed displayed
admin home page
Click on add
View add Add constraints page has Add constraints page
TC022 constraints link from pass
constraints to be displayed displayed
admin home page
Click on add ranges
Add ranges page has to be Add ranges page didn’t error message
TC023 View add ranges link from admin fail
displayed displayed displayed
homepage
Click on Stages link in Add stages page has to be Add stages page is
TC024 View Stages Pass
the admin home page displayed displayed successfully
Setting the It should accept the Different stages are
Admin enters the
TC025 different type of different stages and store it accepted and is saved Pass
different stages
stages in the database successfully
Click on add ranges
Add ranges page has to be Add ranges page
TC026 View add ranges link from admin pass
displayed displayed
homepage
Click on add stages
Add stages page has to be Add stages page didn’t error message
TC027 View add stages link from admin fail
displayed displayed displayed
homepage
Click on upload
View upload Upload treatment page has Upload treatment page error message
TC039 treatment link from doc fail
treatment to be displayed didn’t displayed displayed
page
Click on upload
View upload Upload treatment page has Upload treatment page
TC040 treatment link from pass
treatment to be displayed displayed
doctor homepage
Click on reporting link
Generate report page has to Generate Report page is
TC041 View reporting from the doctor home Pass
be displayed displayed successfully
page
Doctor sets the The Report should be
particular disease type generated with the Report is generated
TC042 Generating Report Pass
and stage to generate particular disease type and successfully
report stage
Click on view result
Result page has to be Result page didn’t error message
TC043 View result link from doctor fail
displayed displayed displayed
homepage
Click on add
View add Add constraints page has Add constraints page error message
TC045 constraints link from fail
constraints to be displayed didn’t displayed displayed
doctor homepage
Click on add
View add Add constraints page has Add constraints page
TC046 constraints link from pass
constraints to be displayed displayed
doctor home page
Click on change
View change Change password page has Change password page error message
TC047 password link from fail
password to be displayed didn’t displayed displayed
doctor homepage
Click on change
View change Change password page has Change password page
TC048 password link from pass
password to be displayed displayed
doctor homepage
Click on patient
View patient Patient registration has to Patient registration page error message
TC049 registration link from fail
registration be displayed didn’t displayed displayed
receptionist homepage
Click on patient
View patient Patient registration has to Patient registration page
TC050 registration link from pass
registration be displayed displayed
receptionist homepage
Click on the billing
View billing Billing details has to be Billing details page didn’t error message
TC051 details link from fail
details displayed displayed displayed
receptionist homepage
Click on the billing
View billing Billing details has to be Billing details page
TC052 details link from pass
details displayed displayed
receptionist homepage
Click on patient details
View patient Patient details page has to Patient details page didn’t error message
TC053 link from receptionist fail
details be displayed displayed displayed
homepage
Click on patient details
View patient Patient details page has to Patient details page
TC054 link from receptionist pass
details be displayed displayed
homepage
Click on change
View change Change password page has Change password page error message
TC056 password link from fail
password to be displayed didn’t displayed displayed
receptionist homepage
Click on change
View change Change password page has Change password page
TC057 password link from pass
password to be displayed displayed
receptionist homepage
Admin should be
Click on sign out link Website homepage didn’t error message
TC058 Admin sign out redirected to the website fail
in admin homepage displayed displayed
homepage
Admin should be
Click on sign out link Website homepage is
TC073 Admin sign out redirected to the website
in admin homepage displayed Pass
homepage
Clink on update
Update treatment Treatment details has to be Treatment details page
TC074 treatment details link Pass
details updated Updated successfully
from patient homepage
Treatment details has to be
Clink on delete
Delete treatment deleted Treatment details page
TC075 treatment details link Pass
details deleted
from patient homepage
Chapter 7
SNAPSHOTS
Login Page
Login Types
Invalid Condition
Doctor Login
Admin Login
Reporting Page
Change Password
To Add Constraints
CONCLUSION
Chronic Kidney Disease has been predicted and diagnosed using data mining classifiers: ANN
and Naive Bayes. In this proposed work, some of the factors considered were age, diabetes,
blood pressure, RBC count etc. The work can be extended by considering other parameters like
food type, working environment, living conditions, availability of clean water, and
environmental factors for kidney disease detection. This project is a medical sector application
which helps the medical practitioners in predicting the disease types based on the symptoms.
Patients can also predict diseases by entering symptoms in the form of sentences. It is
automation for disease prediction and it identifies the disease, its types and complications from
the clinical database in an efficient and an economically faster manner. It is successfully
accomplished by applying the Naïve Bayes algorithm for classification. The classification
technique comes under data mining technology. The proposed work takes symptoms as input and
predicts the disease based on old patients data.
FUTURE ENHANCEMENT
Query Module
We can add the query module as a future enhancement to the application where doctor,
receptionist and admin of the application can interact with each other.
REFERENCES
[1] Aditya Sunda N., Pushpa Latha P., Rama Chandra M.(2012, June). Performance Analysis of
Classification Data Mining Techniques over Heart Disease Data Base. International Journal of
Engineering Science and Advanced Technology(IJESAT). (pp. 470-478),2012.
[2] Ilayaraja M., Meyyappan T. (2013, February).Mining Medical Data to Identify Frequent
Diseases using Apriori Algorithm. In Pattern Recognition, Informatics and Mobile Engineering
(PRIME), 2013 International Conference on (pp. 194-199).IEEE.
[5] Mostafa Ghannnad Rezaie, Hamid Soltanian Zadeh. Interactive Knowledge Discovery for
Lobe Epilepsy. International Journal of Advanced Science and Technology,(pp. 45-48),2013.
[6] Neha Sharma, Er. Rohit Kumar Verma (2016,September).Prediction of Kidney Disease by
using Data Mining Techniques. International Journal of Advance Research in Computer Science
and Engineering (IJARCSSE), vol 6, issue 9, 2016.
[7] Rajan J. R., Chelvan C. C. (2013, December). A Survey on Mining Techniques for Early
Lung Cancer Diagnoses. In Green Computing, Communication and Conservation of Energy
(ICGCE), 2013 International Conference on (pp. 918-922).IEEE.
[8] Lakshmi K. R., Nagesh Y., VeeraKrishna M. (2014). Performance Comparison of Three Data
Mining Techniques for Predicting Kidney Dialysis Survivability. International Journal of
Advances in Engineering & Technology (IJAET), 7(1), 242-254, 2014.
[9] Agarwal Y., Pandey H. M. (2014, September). Performance Evaluation of Different
Techniques in The Context of Data Mining-A case of an eye disease. In Confluence the Next
Generation Information Technology Summit (Confluence), 2014 5th International Conference-
(pp. 72-76). IEEE.
[10]Ahmed S., Tanzir Kabir M., Tanzeem Mahmood N., Rahman R.M. (2014, December).
Diagnosis of Kidney Disease using Fuzzy Expert System. In Software, Knowledge, Information
Management and Applications (SKIMA), 2014 8th International Conference on (pp. 1-8).IEEE.