Professional Documents
Culture Documents
Tambi 2018-FPR
Tambi 2018-FPR
Tambi 2018-FPR
FACULTY OF ENGINEERING
Ankit Tambi
(2017/2018)
- ii -
The candidate confirms that the work submitted is their own and the appropriate credit has
been given where reference has been made to the work of others.
I understand that failure to attribute material which is obtained from another source may be
considered as plagiarism.
ABSTRACT
The Emergency Room (ER) is a high priority entry point in the hospital where
patients arrive at very severe condition and require immediate attention. Failure to provide
quick and effective treatment may even lead to death.
To avoid such mishap, the ER needs a good team interaction among professionals
which can be ensured by deciding their exact roles and relations. Moreover, proper
management of all resources/inventories required, beforehand, may reduce the long waiting
time due to lack or scarcity of necessary resources. Managing all this is a very tedious job;
however, Process Mining (PM) in healthcare helps understand the complex processes by
analysing its process models using software like, ProM and Disco.
The process model generated, using this methodology, gave us insights into the
order and frequency of the procedure events executed for patients diagnosed with
Pneumonia, admitted through ER. This information can help focus on the commonly
executed procedure events, such as chest x-ray, and help reduce waiting time and chaos,
thereby decreasing the discomfort of the patients during their stay at the ER.
- iv -
ACKNOWLEDGEMENTS
LIST OF FIGURES
LIST OF TABLES
TABLE OF CONTENTS
1 INTRODUCTION
1.1 Overview
Emergency care, a field that developed only in recent decades, deals with patients
arriving in critical condition. The patients arrive with a collection of symptoms of a range of
severity which have to be assessed rapidly. Following this, instantaneous decisions have to
be made, which lead to a collection of events occurring in a complex manner for the rapid
effective recovery of the patient. Thus, studying the processes in the ER (Emergency Room)
can lead to a better understanding of the processes in addition to, increased transparency
[2].
Developing a data reference model using MIMIC-III database to answer FPQ’s posed
by experts using Question Driven Methodology (designed specifically for solving FPQ’s in
ER processes by Rojas E. et.al.) and applying Process Mining (PM) tools, techniques and
algorithms on ER data for pneumonia patients by generating process models, analysing
them by observing what activities are frequently executed with pneumonia patients which
gives us an idea about what resources to focus the attention on in order to add more
structure to the processes to decrease the waiting time.
2
Since the scope of this project is limited, rather than doing workshops with the
clinicians, we are using MIMIC-III data. MIMIC--III is a huge, free-access database
comprising healthcare-related information of more than 40,000 patients admitted to critical
care units between 2001 and 2012.
The aim of this project is to answer one of the FPQ's posed by ER experts using
QDM developed by Eric Rojas et.al. specifically for ER processes by using PM tools,
techniques and algorithms. I have developed a Data Reference Model (DRM) which further
helps to focus to solve Process Discovery Question (in this case for pneumonia) and gain
insights about the frequent things happen in ER which further helps to manage better
resources/inventories to reduce waiting time and chaos .
1.4 Aim
1.5 Objectives
2. To learn the basic workings of several Process Mining tools and techniques.
3. To analyze the interaction among various healthcare professionals at ER using the event
log.
1.6 Deliverables
1) The models on ER processes
Months : TASK
At the initial stage of this project, to gain a better understanding of Process Mining
and its different tools, techniques and algorithms an online course by Van der Aalst on
Coursera named "Process Mining: Data Science In Action'' [17] was taken. Later, in order to
gain further knowledge on the Prom Software, "Introduction to Process Mining with ProM''
[18] course from future learn was studied.
After the initial training was done in PM and PM tools (ProM and Disco), the next step
was to gain access to MIMIC-III dataset. MIMIC-III is a huge database, considered under big
data which I already have knowledge about, since I studied this module in my second
semester. Thus, with the help of big data, I seek to find the possible V's of big data from the
huge database like MIMIC-III.
Then I studied some of the frequently used Process Mining methodologies namely
PM2: Process Mining Project Methodology, clear path method and L* lifecycle method until
Eric Rojas came to visit university and gave presentation on the use of Question-Driven
Methodology in healthcare for solving Frequently-Posed Questions by healthcare experts
which are specifically used in Emergency Room (ER) processes and after meeting him
personally and attending his presentation, I decided to redo his latest Question Driven
Methodology for analyzing ER process using Process Mining paper on MIMIC-III database.
The Emergency care, in recent decades, has become recognized as a specialty in its
own right and is provided at the Emergency Room (ER) to the patients arriving in critical
conditions requiring urgent attention and care; making it the first contact point with the
healthcare system for unscheduled and undifferentiated patients of all ages [10,9,1].
Following the screening by the nurses, the emergency medicine professionals further
examine the patients, in their order of severity, and provide their assistance [3]. Their
primary aim is to fully treat the patients and help them recover. Whenever that's not
immediately possible, they aim towards alleviating their set of symptoms. To achieve this,
rapid assessment followed by quick and accurate decisions for the recovery of the patient
have to be made through a systematic collaboration among various healthcare
professionals, making team interaction and collaboration a key aspect for the success at the
ER. Their coordinated collaboration is critical for the prompt care of the patients, who arrive
at the ER in a delicate health condition requiring immediate attention [3].
Many studies point towards inadequate team interaction in the ER as the primary or
contributing cause of over half the malpractice claims [1,2]. Severe impact on the patients'
treatment at the ER has been observed in the absence of adequate team structure and/or
defined appropriate goals and responsibilities of all the team members and in some cases
non-involvement of the relevant team members in the decision-making process, lack of
standard protocol and poor prioritization of activities, followed by poor communication, led to
further chaos. This puts the patients at higher risk by wasting precious time in cases of
utmost emergency [3].
On the other hand, a good team interaction aiding the planning and execution of ER
processes can significantly reduce morbidity and mortality rates. This can also make the
patients' stay at the ER much more convenient. Moreover, this can reduce clinical error
rates, which in turn, reduce legal costs, and even impact other human aspects such as
reduce stress and frustration among patients as well as healthcare professionals [3].
Thus, by applying Process Mining on the stored database through the use of QDM
and getting answers from it, we can overcome most of these challenges faced in the ER.
scope for the improvement of productivity by figuring out well-defined patterns of several
processes and the roles of the healthcare professionals in the latter, thus reducing the chaos
and waiting times while at the same time reducing costs [4].
This can be done by observing and analysing many of the complex non-trivial and
time-consuming processes (clinical and administrative) occurring at the ER in order to come
up with more efficient suggestions [4,2]. One possible approach commonly undertaken is
conducting interviews. Unfortunately, this is a time-consuming process. Moreover, the
suggestions are quite often highly subjective since each person in the healthcare processes
tends to have an ideal scenario in mind, which in reality is only one of the many scenarios
possible [4].
The extraction of an event log is the starting point for Process Mining. In an ER; the
event log can be viewed as a set of episodes (case) consisting of several procedures or
activities that were executed for a particular instance [3]. The event logs in the ER records
also store additional information about the events, such as the timestamp, data element
(e.g., age, sex of the patient). In fact, whenever possible, Process Mining techniques use
extra information (resources) [4]. An example of an event log is given in fig 2.1.
Process Mining seeks to observe event data (i.e., observed behaviour) to determine
process models [4].The three main types of Process Mining are discovery, conformance and
enhancement. A discovery technique seeks to produce a process model from the event log
without using any previous (a-priori) information. Conformance is essentially the step
preceded by discovery or enhancement since an existing process model (formed through
discovery or enhancement) is compared with an event log of the same process.
Conformance checking is done to check whether the process model conforms with the event
log and vice versa expressed in terms of fitness [4]. The third type of Process Mining is an
enhancement. This deals with extending and/or improving an existing process model using
information about the actual process recorded in the event log [4].
However, Process Mining being an emerging field still has several limitations to its
application. This includes the limited availability and implementation of healthcare data that
is process aware and that records event logs. Further, data extraction and interpretation (to
respond to questions frequently posed by experts) can prove to be a tedious job involving
the high dependence on ER experts; In order to identify opportunities for applying Process
8
2.4.1 ProM
2.4.2 Disco
2.4.3 MySQL
2.4.4 HeidiSQL
2.5.1 MIMIC-III
The latest version which was used in this project for MIMIC is MIMIC-III v1.4 in which
more than 40,000 hospital admissions are stored out of which 38,645 are adults and 7,875
are neonates [15].
2.5.2 Disco
The latest version which was used in this project of Disco is 2.2.1 released on
28/08/2018.
2.5.3 ProM
The latest version which was used in this project of ProM is ProM PM 6.8 and
XESame 1.8
2.5.4 HeidiSQL
The latest version which was used in this project of HeidiSQL is version 9.4.0.5125
(32 bit) compiled on 2016-10-21.
The latest version which was used in this project of MySQL is version 8.0.13 build
13780177 CE (64 bits).
10
MIMIC-III has over 40,000 hospital admissions [15] and the database is enriched with
detailed information of patients and to gain access to such a huge freely available database
for research purpose we have to follow some steps -
2. Then, pass the ethics test. After passing ethics test user can
3. Apply to access the MIMIC-III database through Physionet by providing it CITI test
completion report as soon as the application gets approved user can get access to
4. Download all the tables from the MIMIC-III database Initially, tables are in the form of
compressed CSV files and user needs to
2.7 MIMIC-III
The data is spread across 26 relational tables that include information like diagnoses,
diagnostic codes, bedside measurements of vital signs, laboratory observations, notes
charted by caregivers, admission and discharge time and locations, survival data, relevant
personal patient information and more. These tables are linked to each other through various
IDs like SUBJECT_ID, HADM_ID, ITEM_ID, CGID.
And for that purpose, a Data Reference Model was developed specifically for solving
frequently posed questions posed by experts [1] in ER processes which are discussed in
Data Reference Model chapter.
Process Mining bridges the gap between the data obtained by Data Mining and the
algorithm implemented through Machine Learning. Multiple methodologies have been
developed to support Process Mining Projects such as Process Diagnostic Method (PDM),
L* life-cycle model and Process Mining Project Methodology (PM2).
2.9 Conclusion
**not implemented in this project since it was beyond the scope of this project.
13
Most of the questions suggested by Eric Rojas et.al. based on HIS database can also
be answered through the MIMIC-III database. He divided these questions into 2 types:
general and episode oriented.
Process discovery is about discovering process models to describe the control flow
of activities. For example, "What is the process model for the procedures performed while
treating patients with a particular diagnosis?"
Conformance checking is checking the fitness of the process model with the event
log to verify whether the processes correspond with that model. This can help check from
time-to-time whether internal protocols are being followed. The higher the fitness, the lesser
are the deviations from the process model.
Stay duration-oriented questions are based on the total duration of a patient's stay at
the ER. The characteristics of patients staying for different durations can be checked through
this type of questions. For example, "What are the characteristics of the patients staying for
a duration shorter than 24 hours?" this can be done by filtering those variants that stay for
lesser than 24 hours. A dotted chart of patients with Pneumonia showing variants vs stay
duration is given in fig 3.2.
The ER patient discharge-driven questions are concerned with the destination of the
patients following ER. The patient could either be discharged home or admitted to other units
of the hospital. These questions seek to find the characteristics that differ in either
destination. For example, "What are the activities executed for the patients that are
discharged home?"
Further compound questions concerning more than one of the above criteria may
also be answered. Such questions require combined data from all those basic questions.
Filters are applied to this data to reach the required answers. For example, "What are the
15
characteristics of the patients with pneumonia that are discharged home within 24 hours of
their admission to the ER?"
All these questions can be solved through the Data Reference Model specifically
made for ER from MIMIC-III database described in Data Reference Model chapter.
16
4 METHODOLOGY
This stage is concerned with the identification of the data to be used for the FPQs
followed by its extraction from its source. Based on this, a data model is created. This
requires checking for the availability of timestamps, defining the events or activities, creating
any extra fields wherever necessary and verifying the quality of the extracted data. This can
be summed up in the table 4.1 below.
Table 4.1 Guidelines for the data extraction stage (MIMIC-III) [1]
1.1: Identify Have access to the correct Make sure you have permissions
available data in data from the direct and access granted to them
MIMIC-III and sources, them being directly or through the data owner.
build the data MIMIC-III.
Identify if data are missing in the
model
data sources, and check if it is
feasible to execute the analysis
(e.g., timestamps are the minimum
required data). The available data
model should contain as many
dimensions and attributes of the
data reference model as possible.
1.2: Ensure Check that for each event If different levels of accuracy are
availability of a or activity included in the present, the highest one present in
17
1.3: Name In case any activity or event Use meaningful names for the ER
events does not have an experts.
appropriate name, one
should be assigned to it.
1.5: Verify data Further general issues have Check lack of data, incorrect data
quality been identified from the or the inaccuracy and irrelevance
literature review that must of data.
be tackled when generating
Check in more detail all of the
an event log for process
significant challenges previously
mining purposes in
found in the literature.
healthcare.
This stage selects specific data (for example, including only necessary columns such
as case, time, event and necessary resources) to generate an event log from the extracted
data based on the FPQ’s. This event may need to be revisited several times depending upon
18
the realization of the requirement of more resources. Table 4.2 sums up the activities of this
stage.
Table 4.2 Guidelines for the event log creation stage. FPQ, Frequently-Posed Question
[1]
2.1: Identify data required Identify the FPQ to be Have clarity and a good
to perform the specific answered and identify understanding of the
analysis what data from the general FPQ’s that are desired to
data model will be used. be answered.
2.2: Create the event log Once the data stored in Establish the format in
the data model are which the event log will be
available, a specific event built.
log must be created each
Tools such as Excel with
time a question requires a
comma separated values
response.
files can be used, but
more specific standards
(such as XES) should also
be considered.
The filtering stage, as the name suggests, involves filtering the event log based on
the specific requirements of the question such as filtering for particular diagnoses and further
filtering of data to exclude certain variants. This filtering is mainly done through Process
Mining tools especially the ones in Disco. At times, MS-Excel can also be used to filter
certain attributes.
The data is analysed to discover different patterns and to gain more insights about
the datasets. This is done through different data analysis techniques and tools. Appropriate
data analysis techniques are to be selected based on the expected results. Identification of
tools required to perform the selected technique is further selected.
The Process Mining stage includes the selection and application of appropriate
techniques through the use of tools corresponding to those techniques. This stage focuses
on discovery, analysis and improvement of real processes. This can be done through a
variety of Process Mining tools including Disco [12] and ProM [16]. Four types of process
analysis are performed: process discovery, conformance analysis, performance analysis
and organizational analysis.
Process discovery based on the event log results in the discovery of a model that
describes the activities and paths taken in different cases. This is done through several
models such as Fuzzy model (which is more flexible in nature), heuristic miner, Petri net,
inductive miner.
Performance analysis is performed from the time perspective. It considers the activity
duration and waiting time of activities to discover bottlenecks and waiting times. This can be
performed through tools such as ProM.
20
Organizational analysis is done from the resource perspective. It identifies the roles
and relations between the resources during the case execution. This can be performed
through organizational metrics in ProM.
Question Analysis
This stage involves evaluation of the obtained results through relevant ER experts
(who knows the complete process including each task performed) by means of
questionnaires, interviews and focus groups [1]. The experts may infer that the obtained
results are irrelevant or uncommon, upon which previous stages will have to be revisited to
verify whether the techniques and filters were applied appropriately.
21
The data collected and used is derived from the MIMIC-III, which is a huge
healthcare-related database with medical records of over 40,000 patients admitted at a
tertiary-care hospital (Beth Israel Deaconess Medical Centre) between 2001 and 2012 [15].
ADMISSIONS table contains information about the admission of the patient to the
hospital, such as the time and location of admission as well as discharge. Further, their initial
diagnosis is also contained in this table. The episodes (or cases) are referred through 2
unique IDs (SUBJECT_ID, HADM_ID). The SUBJECT_ID refers to the unique ID of a
patient, while the HADM_ID refers to a single admission at the hospital, that is, a patient may
have a single SUBJECT_ID and multiple HADM_IDs, each referring to a single different
admission at the hospital.
These IDs link the ADMISSIONS table to all the -EVENTs tables. Different –EVENT
tables contain records of different types of events taking place (for example, the
PROCEDUREEVENTS_MV table contains the records of all the procedures performed on
the patients), described in the form of ITEMIDs. These ITEMIDs on their own can impart no
understanding since it is just numbers. The full descriptions of what each of these ITEMIDs
mean are given in the D_ITEMS table which needs to be linked to the –EVENTs tables via
the ITEMID.
linking of D_ITEMS table, and the different categories of processes can be separately
analyzed one by one [5].
The MIMIC-III database is an extension of MIMIC-II, that is, MIMIC-III contains data
from MIMIC-II (collected between 2001 and 2008) in addition to the newly collected data.
This transition was accompanied by the transition in the data management software, from
the CareVue (CV) system (implemented by MIMIC_II) to the MetaVision (MV) system, which
leads to several changes in the way the information was stored and interpreted.
Moreover, in order to protect patient confidentiality, all the dates in the database have
been shifted randomly to some point in the future. However, internal consistency w.r.t. the
patient is maintained so that the timestamped-data for the same patient in all the tables is
synchronizable.
Since this project specifically seeks to study the processes in the ER, only the
records of those patients admitted through the ER were included. Further, only the records
collected through the MetaVision (MV) system (2008 onwards). A Question-driven approach
was used on these records to describe the processes occurring in the ER. The questions
used were obtained from the study performed by Eric Rojas et al., March 2017.
With that as a reference, the question considered for this methodology is "What are
the procedure events executed while treating patients diagnosed with pneumonia admitted
through ER?''
Stage 1 and 2:
The data was extracted from the MIMIC-III database followed by its cleansing.
Out of the aforementioned 26 tables, the tables necessary for solving the question
were selected. In this case, the records from ADMISSIONS, PROCEDUREEVENTS_MV
and D_ITEMS tables were used. This data needed to be cleaned and edited. At first, this
didn't seem very important, however later when the tables (in Comma separated value
(CSV) format) were being imported in the SQL tools to be joined for generating the event
log, several errors popped-up. The RDBMS used was MySQL and the graphical tool used to
import and join the tables was HeidiSQL.
23
This was the lengthiest of all the stages, given the size, complexity and inconsistency
of the records. Information in several places was missing, this generated data truncation
errors each time since the data in those rows was less than expected. These rows had to be
manually excluded every time. Further, the date-time formats threw several errors. At first,
the format had to be changed to "yyyy/mm/dd hh:mm:ss" before importing to the SQL
Servers. In the later stages, the date-time data would get considered as text. For this, one
possible approach was using the option for formatting text to date, however, that led to the
exclusion of the time part which was a very crucial component of the event log. Eventually,
all steps had to be repeated from the beginning to include the date-time column from the
source.
Only selected columns from the data were imported. Moreover, the records were
filtered to include only the records with the attribute EMERGENCY ROOM ADMIT from the
ADMISSION_LOCATION column in the ADMISSIONS table. Following this, the columns
imported were HADM_ID, ADMIT_TIME, DISCH_TIME and DIAGNOSIS from the
ADMISSIONS table. Preference was given to HADM_ID over SUBJECT_ID since each
patient may have several different entries with different symptoms and diagnoses each time,
and the processes being analyzed were specific towards diagnoses rather than the patients.
The diagnosis was derived from the ADMISSIONS table itself since it contained the brief
diagnosis given in the ER based on the initial symptoms the patient first appeared at the ER
with.
Once the required tables were imported, they had to be joined. The type of join was
decided on the basis of requirements. An example query for joining the ADMISSIONS table
with the PROCEDUREEVENTS_MV table is as follows:
FROM admissions a
The newly generated table (combined_table_1) was later joined similarly with the
D_ITEMS table to add 2 more columns (LABEL, CATEGORY). Moreover, it was observed
that the starting (as well as ending) events of the cases were different. This was quite
problematic for the application of several Process Mining techniques. Hence the admission
and discharge events had to be added to the event log among other procedures. For this,
from the ADMISSIONS table, the HADM_ID and ADMIT_TIME columns were extracted to
match with the HADM_ID and START_TIME columns of the event log, respectively.
Moreover, to match with the LABEL and CATEGORY columns, corresponding columns were
added with the attribute ‘admission'. And all the rows from this table were added to the initial
event log. Similar steps were also taken for adding the discharge event, using HADM_ID and
DISCH_TIME instead and setting the LABEL and CATEGORY attribute as DISCHARGE. A
sample of the event log thus generated is shown in fig 5.1
Stage 3. Filtering:
The filtering of the data was done at several levels. As mentioned in the previous
section, the ADMISSIONS table data was filtered to only include the cases that were
admitted through the Emergency Room. This was done through MS-Excel. Later only the
procedure events corresponding to these ER patients were included in the final event log,
through the use of CROSS JOIN with the WHERE clause. This was done through HeidiSQL.
It was noted in the next stage that some of the variants (about 1% of the total) didn't begin
with admission and/or end with discharge. This might have occurred since several records
with data quality issues were excluded in Stage 1. These variants were filtered through
Disco.
25
Furthermore, to specifically show the processes in Pneumonia patients, the event log
was filtered to only include the cases with the diagnosis of Pneumonia. Hence, it can be
noted that the filtering can be done through several applications as suitable.
In this stage, data were imported to the Process Mining tools and analysed. The
mean duration and frequency of the activities/events were noted. Further, the process map
generated in Disco with 40% activities and 10% paths was as shown in fig 5.3(b). This was
when the labels of the procedures were used as the event.
Fig 5.3(b) Process map with LABEL as activity (40% activities & 10% path)
Upon the analysis of this map, it became clear that further data analysis was
required. The information in the CATEGORY column categorizes the activities into 10 types.
Eventually, CATEGORY was used as the event instead of LABEL. The following map,
showing 100% activities and 10% paths, was generated when these changes were made.
Fig 5.4 Process map with CATAGORY as activity (100% activities & 10% path)
Further, we tried to analyze the process for a particular diagnosis, in this case,
pneumonia So, the selection of only the cases with Pneumonia as the diagnosis (around 7
27
thousand cases) was done. Out of all these cases, around 1% cases that didn't begin with
admission and/or didn't end with discharge were excluded. And the data and processes
were analyzed, first in Disco and later in ProM.
The following fig 5.5(a) and 5.5(b) shows the result of process discovery through
Disco showing 10% and 70% of all the paths, respectively.
Fig 5.5(a) Process map for pneumonia with CATAGORY as activity (100% activities &
10% path)
Fig 5.5(b) Process map for pneumonia with CATAGORY as activity (100% activities &
70% path)
28
It is to be noted that the darkness of the colour of the activities in the above process
maps is proportional to the frequency of the activities. This is also shown in the fig 5.6(a). It
can be observed that imaging and cultures techniques are done quite frequently. In fact, it
can be noted that an average of more than one imaging process per episode is done. Thus,
total case duration can be shortened quite significantly with any decrease in the duration of
these activities, by either increasing the availability of the tools (or machines) and resources
required to perform these activities, or by bettering these techniques to reduce their actual
duration. Upon further analysis of the frequency of the actual activity (with LABEL as the
event), it can be noted that chest x-rays, EKGs and Blood, Sputum and Urine cultures are
done in over half the cases, on an average.
Following this, conformance checking was done through the conformance technique,
Replay log on Petri Net for Conformance analysis, was done through ProM. This required an
event log and a Petri Net model. The Petri Net model was generated through process
discovery techniques, which was as shown in the fig 5.7
Also, the results obtained through the conformance analysis of the above Petri Net is
shown in fig 5.8 below.
30
The trace fitness, as shown in the fig 5.9(b), is 0.977, which is almost 1, meaning that
most of the traces fit perfectly with the process model. Access line – peripheral is one activity
where the moves aren't synchronous, that is, it has no corresponding activity in the model
that was enabled during the replay.
At this point, attempts to repair the log by adding the missing event can be done.
However, ER processes are highly flexible [1] in nature and the data is noisy with several
missing events. Discovery algorithms like heuristics miner were developed to deal with such
data, since they take the log as it is, and don't try to repair missing events [6]. The Petri Net
generated by Interactive Data-Aware Heuristics Miner is as shown in fig 5.10.
Fig 5.10 The Petri Net generated by Interactive Data-Aware Heuristics Miner
5.3 Conclusion:
While implementing all these steps, we were able to develop a Data Reference
model, specifying the exact relations of only the necessary tables and their necessary
columns. This model has been described in the next chapter.
The initial step is to filter only the episodes admitted through the Emergency Room,
from the ADMISSIONS table. It should be kept in mind that all the cases in other tables are
for all the patients admitted to the hospital. So, they all cannot, directly, be used in the event
log. Thus, the ADMISSIONS table forms the base of all the tables (Primary level table); and
other tables to be used must only be cross joined to the filtered ADMISSIONS table where
the HADM_IDs from both or all the tables match.
The tertiary level tables are further linked to elements in the secondary level table.
The D_ITEMS table links through ITEMID to all the –EVENT tables except LABEVENTS and
CPTEVENTS. D_LABITEMS links similarly through ITEMID to LABEVENTS. ITEMID gives
detailed descriptions of the events in the event tables. D_CPT table that links to
CPTEVENTS table has also not been used.
The CAREGIVERS table is also a tertiary level table that describes and links through
the CGID to the –EVENT tables. D_ICD_DIAGNOSES and D_ICD_PROCEDURES are
more tables on this level but haven't been used in this model.
Primary table:
1. ADMISSIONS
Secondary tables:
1. CHARTEVENTS
2. DATETIMEEVENTS
3. INPUTEVENTS_CV
4. INPUTEVENTS_MV
5. LABEVENTS
6. MICROBIOLOGYEVENTS
7. NOTEEVENTS
8. OUTPUTEVENTS
9. PATIENTS
10. PRESCRIPTIONS
11. PROCEDUREEVENTS_MV
12. SERVICES
1. D_ITEMS
ITEMID, LABEL, CATEGORY; this table is to be sorted on the basis of LINKSTO, depending
upon which –EVENTS table it is being linked to.
2. D_LABITEMS
3. CAREGIVERS
CGID, LABEL
The primary keys that are required to link with other tables are highlighted. Other
columns may be included, if and when needed, for further details.
35
7 PROJECT EVALUATION
Also, different process models have been generated using software like ProM and
Disco.
After generating those process models they have been analysed and by analysing
them it's been observed which activities are frequently executed for pneumonia patients and
so give us an idea about what resources to focus the attention on in order to add more
structure to the processes to decrease the waiting time.
Through this project, I learned to trust the process and trust myself in order to
successfully reach the goal and this could never be achieved without the help of my
professors. Moreover, I learned how to handle a larger project by myself.
Learned to write the thesis and went through all its processes like learning more
about academic writing, reference writing, and giving citations etc.
This project has given me golden opportunity to read and learn about so many other
related researchers and through that, I learned that the world of work is full of collaborative
efforts and the relationship you made with the subject and authors.
I learned to use software and write codes, which was my biggest fear.
I mostly enjoyed the reading and understanding part of the project but found report
writing part quite difficult.
But the next time whenever I work with some other project it will be much easier for
me because of this experience. And this experience with handling a project on your own,
going through so many data tables, learning and understanding them thoroughly,
implementing the procedure and getting the result will definitely help me in the real world;
and I'm very grateful for it.
36
In this project, the processes in ER, particularly for patients with pneumonia, were
studied. This also gave us more insights on frequency of the activities executed, which in
turn, can help manage the resources beforehand.
The scope of this project is to answer an example of one of the FPQs by ER experts.
Future work may involve answering further complex FPQs. Moreover, there’s a huge
scope in the development of a search platform for healthcare by connecting the actual
database of every hospital within the country and by applying PM techniques on it, through
which we could not only gain more insights into the workings of ER but also of every sector
in healthcare.
With the help of PM techniques in Healthcare we can gain more insights about:
2. How much healthcare staff and of what specialization is required in that part of the
country for that particular season.
Thus, making healthcare much more manageable and efficient, decreasing morbidity
and mortality.
37
REFERENCES
3. Alvarez, C. et al. Discovering role interaction models in the Emergency Room using
Process Mining. Journal of Biomedical Informatics 78 (2018), pp.60–77.
4. Mans, Ronny S., van der Aalst, Wil M.P. and Vanwersch, Rob J.B.Process Mining in
Healthcare. Switzerland :Springer International Publishing AG, 2015.
5. Kong, Xiaoqing. Process Mining of Big Data in Healthcare. MSc Computer Science. The
University of Leeds. 2016-17.
6. Mans, Ronny S. et al. Repairing Event Logs Using Stochastic Process Models.
[Online].[No Year].[Accessed 11 Feb 2019]. Available from:
https://pdfs.semanticscholar.org/
7. National Center for Biotechnology Information. Improving the Nation’s Health Care
System. [Online]. 2009. [Accessed 5 Feb 2019].Available From:
https://www.ncbi.nlm.nih.gov/
11. Eindhoven University of Technology. Process Mining. [Online]. 2016. [Accessed 18 July
2018]. Available from: http://www.processmining.org/
12. Fluxicon BV. Discover Your Processes. [Online].2019.[ Accessed 18 July 2018].
Available from: https://fluxicon.com/disco/
13. Oracle Corporation and/or its affiliates. MySQL Documentation. [Online]. 2019.
[Accessed 05 March 2019]. Available from: https://dev.mysql.com/
14. Becker, A. HeidiSQL. [Online]. 2019. [Accessed 23 Feb 2019]. Available from:
38
https://www.heidisql.com/
15. MIMIC-III, a freely accessible critical care database. Johnson AEW, Pollard TJ, Shen L,
Lehman L, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, and Mark RG. MIMIC-
III Critical Care Database.[Online]. 2016. [Accessed 21 June 2018] Available from:
http://www.nature.com/
16. Eindhoven University of Technology. ProM Tools. [Online]. 2010. [Accessed 5 July
2018]. Available from: http://www.promtools.org/
18. Future Learn. Introduction To Process Mining With ProM.[Online] 2018. [Accessed 21
Oct 2018]. Available from: https://www.futurelearn.com/
39
APPENDIX A
Tables in MIMIC-III
CALLOUT: Provides information when a patient was READY for discharge from the ICU,
and when the patient was actually discharged from the ICU
CPTEVENTS: Contains current procedural terminology (CPT) codes, which facilitate billing
for procedures performed on patients
DIAGNOSES_ICD: Contains ICD diagnoses for patients, most notably ICD-9 diagnoses
ICUSTAYS: Defines each ICUSTAY_ID in the database, i.e. defines a single ICU stay
LABEVENTS: Contains all laboratory measurements for a given patient, including outpatient
data
PROCEDURES_ICD: Contains ICD procedures for patients, most notably ICD-9 procedures
APPENDIX B
CODE 1:
FROM procedure_event p
CODE 2:
FROM admission_event ;
CODE 3:
FROM discharge_event ;
CODE 4:
FROM combined_table_1 c