Professional Documents
Culture Documents
Stay Prediction Report
Stay Prediction Report
ON
Healthcare
Prediction IN
Department Of Computer Science
OF
BE(CSE)
Chakshita (2011981258)
` Abhinav (2011985047)
Table Of Contents
Declaration…..................................................................................................................................... 1
Acknowledgement ………………………………………………………………………………... 1
Abstract….......................................................................................................................................... 2
CHAPTER 1 – INTRODUCTION….................................................................................................4
Background........................................................................................................................................ 4
Problem Statement............................................................................................................................. 5
Project aim.......................................................................................................................................... 5
Chapter Overview…...........................................................................................................................6
CHAPTER 2: METHODOLOGY…..................................................................................................7
Dataset Description............................................................................................................................ 7
Data Acquisition and Preparation…...................................................................................................8
Exploratory Data Analysis................................................................................................................. 9
Feature Engineering & Selection…..................................................................................................11
Data Pre-Processing (Splitting & Balancing the data).................................................................... 11
Proposed Algorithms for Classification........................................................................................... 12
CHAPTER 3: Experimental RESULTS........................................................................................... 15
CHAPTER 4: CONCLUSION & FUTURE SCOPE…................................................................... 17
CHAPTER 5: REFERENCES......................................................................................................... 18
Acknowledgement
I would like to convey my heartfelt gratitude to Mr. Shivam Singh, my mentor, for his invaluable
advice and assistance in completing my project. He was there to assist me every step of the way,
and his motivation is what enabled me to accomplish my task effectively. I would also like to
thank all of the other supporting personnel who assisted me by supplying the equipment that was
essential and vital, without which I would not have been able to perform efficiently on this
project.
I would also like to thank Chitkara University for accepting my project in my desired field of
expertise. I would also like to thank my friends and parents for their support and encouragement
as I worked on this assignment.
DECLARATION
We, the undersigned, hereby declare that the project work titled 'Healthcare Analytics,' submitted
as part of our Bachelor’s degree in Computer Science and Engineering (CSE) at Chitkara
University, Punjab, is an authentic record of our own work. This project was carried out under
the guidance and supervision of Mr. Shivam Singh.
Throughout the course of this project, we have conducted in-depth research, analysis, and
implementation, focusing on the field of Healthcare Analytics. We affirm that the ideas,
methodologies, and results presented in this project are the product of our own efforts and
represent a genuine contribution to the field.
We also acknowledge the guidance and support provided by our supervisor, Mr. Shivam Singh,
whose expertise, and mentorship have been instrumental in shaping the direction and quality of
our work. Any external sources of information, data, or assistance utilized during the project
have been duly cited and acknowledged in accordance with academic integrity and ethical
standards.
Furthermore, we understand the importance of academic honesty and take full responsibility for
the content and originality of our project work. We have adhered to the guidelines and
regulations set forth by Chitkara University for the completion of academic projects.
This declaration is made in good faith to affirm the authenticity of our project work and to
uphold the principles of academic integrity.
Signature
Abstract
The rapid global spread of the Coronavirus Disease (COVID-19) has posed a severe threat to
healthcare systems worldwide. The exponential rise in infected patients has led to an increased
demand for Intensive Care Unit (ICU) beds, and the shortage of hospital resources and bed
capacity stands as a critical factor influencing the escalating death rates associated with
COVID-19.
Efforts to address the shortage of medical resources have included the implementation of specific
guidelines to prioritize patients and determine their eligibility for ICU admission based on the
severity of their condition. While these measures are crucial for resource management, there is a
potential downside. The United Kingdom experienced instances where patients adhering to home
quarantine tragically succumbed to the virus, and their deteriorating condition went unnoticed for
up to two weeks, revealing an unintended consequence of these strategies.
Balancing the need for stringent resource allocation guidelines with the imperative to safeguard
patient lives remains a formidable challenge for healthcare systems grappling with the
unprecedented demands imposed by the COVID-19 pandemic.
CHAPTER 1: INTRODUCTION
1.1 Background
The development of the Stay Prediction Model is rooted in the imperative need for hospitals to
enhance operational efficiency and resource allocation. With the growing complexity of
healthcare systems and the increasing demand for optimal patient care, accurately predicting the
Length of Stay (LOS) has become a strategic priority. The background for this predictive model
is shaped by the desire to streamline hospital operations by anticipating patient needs and
optimizing resource utilization. Leveraging historical patient data, the model incorporates a
diverse range of factors such as demographics, medical history, and admission details to classify
patients into specific LOS categories. This approach not only facilitates precise bed management,
staffing, and financial planning but also contributes to improved patient care through better
discharge planning and post-discharge coordination. As hospitals navigate the challenges of
providing quality healthcare while managing resources judiciously, the Stay Prediction Model
emerges as a vital tool, aligning the healthcare industry with data-driven insights for more
effective and sustainable practices.
This parameter helps hospitals to identify patients of high LOS risk (patients who will stay
longer) at the time of admission. Once identified, patients with high LOS risk can have their
treatment plan optimized to minimize LOS and lower the chance of staff/visitor infection. Also,
prior knowledge of LOS can aid in logistics such as room and bed allocation planning.
Suppose you have been hired as Data Scientist of HealthMan – a not for profit organization
dedicated to manage the functioning of Hospitals in a professional and optimal manner.
1.3 Project Aim
The aim of this project is to accurately predict the Length of Stay (LOS) for individual patients is
crucial for hospitals to optimize resource allocation. The LOS is divided into 11 classes, ranging
from 0-10 days to more than 100 days, providing a detailed framework for anticipating patient
needs.
Machine learning algorithms, trained on historical patient data, analyze various factors to classify
patients into specific LOS categories. This information enables hospitals to streamline discharge
planning, allocate resources judiciously, and improve the quality of care. Accurate LOS
predictions also support financial planning by providing insights into the costs associated with
patient stays. Implementing LOS prediction models facilitates optimal resource utilization,
enhancing patient care and overall hospital operational efficiency.
Chapter I: Introduction
The chapter talks about the problem statement and what is the reason for selecting the following
problem statement and what contribution would be made to solve the problem along with the
execution plan.
Comprising 17 attributes and 318438 instances, the dataset features 2 continuous random
variables and 15 discrete random variables. The variables cover a spectrum of patient-related
information, including age, admission diagnosis, insurance status, comorbidities, family medical
history, and initial and subsequent grades of medical conditions. Prior to analysis, the dataset
undergoes preprocessing to address missing values and eliminate duplicate entries. Subsequently,
exploratory data analysis is conducted, visualizing trends and patterns through various graphs
and charts. Feature engineering is then implemented to enhance the predictive power of the
dataset, followed by the development of a classification model using diverse machine learning
algorithms. This comprehensive approach aims to provide healthcare institutions with a valuable
tool for predicting and managing patient stays effectively.
The second important thing which we need to check is if there is a presence of any duplicate rows
in the dataset and here also we are lucky that the dataset is having any duplicate rows.
Finally, it is a common approach that can help in summarizing the data in data exploration. So, in
our project we also include this exploratory data analysis and we have patient data, and the number
of examples, as well as the number of features, are very high, so analyzing the data becomes
important before making any prediction. Now, let us see what are all graphs and plots we have
included in our project to understand and analyze the complete information and insights which is
there inside our dataset.
Fig1: Pie chart to show the percentage distribution
of the Target Variable
The pie chart delves into the distribution of bed occupancy based on the length of stay, revealing
insights into the temporal utilization of healthcare resources. Notably, a substantial proportion,
accounting for 27.5%, corresponds to patients with a stay duration of 0 to 10 days. Following
closely at 24.5%, the occupancy spans the range of 41 to 50 days, constituting half of the total
bed occupation.
To strategically emphasize these pivotal periods, the chart employs the "explode" technique,
accentuating slices associated with both shorter and longer stays. The enlarged figure size
contributes to an enhanced overall presentation, while explicit labels and percentages foster a
nuanced understanding of the bed occupancy landscape.
Titled "Distribution of Bed Occupancy by Length of Stay," this graphical representation not only
informs but captivates, providing stakeholders with a visually compelling narrative on the
temporal dynamics of healthcare resource utilization.
Fig 2: Histogram to show number of beds occupied based on various Departments
Examination of the histogram provides valuable insights into the distribution of patient
admissions across different hospital departments. Notably, the Gynecology department emerges
as the focal point, experiencing the highest influx of patients. This observation underscores a
pronounced demand for specialized gynecological services within the healthcare facility.
Following closely, the Anesthesia and Radiotherapy departments also showcase substantial
admission rates, shedding light on the vital contributions these departments make to overall
patient care. The heightened admission rates in Gynecology may align with demographic trends,
emphasizing the importance of tailored healthcare services for women.
The detailed analysis of the graph highlights the Gyne Department as the predominant
contributor to the hospital's extended duration of stay, with a noteworthy concentration of stays
falling within the 21 to 30-day range. This finding underscores the Gyne Department's pivotal
role in delivering comprehensive medical care and focused attention to patients requiring
prolonged hospitalization.
This observation emphasizes the Gyne Department's substantial involvement in addressing the
healthcare needs of patients with extended stays. The specific concentration within the 21 to
30-day range suggests that the department is actively managing cases that demand a more
thorough and extended medical intervention.
This nuanced insight not only underscores the department's significance but also provides
valuable information for the hospital's strategic planning and resource allocation. By recognizing
and understanding the distinct nature of cases within this duration range, the hospital can better
tailor its services to meet the specific demands of patients who require a more extended and
comprehensive medical care approach.
Fig 4 : Histplot to see distribution of bed grade in the Hospital
Among all the beds available in the hospital, a notable trend emerges with the 2.0 bed grade,
showcasing the highest count and surpassing the significant milestone of 120,000. This
substantial count underscores the pronounced occupancy and utilization of beds within the 2.0
grade, signifying its pivotal role in accommodating a large volume of patients.
Following closely, bed grades 3.0, 4.0, and 1.0 also exhibit considerable counts, albeit in
descending order. This pattern suggests varying levels of occupancy across different bed
categories, reflecting the diverse needs and requirements of patients seeking medical care at the
hospital.
The dominance of the 2.0 bed grade in terms of count may imply that this specific category
caters to a substantial portion of the patient population, potentially addressing general healthcare
needs or being designated for specific medical conditions. Understanding the distribution of bed
occupancy across different grades is crucial for hospital administrators and planners, as it
provides insights into the demand for various levels of medical care and aids in strategic
resource allocation to optimize patient services effectively.
Fig 5 : Count plot to show distribution of bed in various Departments
Upon careful analysis, it becomes evident that the Gyne department stands out with the highest
occupancy rate among all hospital departments, regardless of bed grade distinctions. This
observation highlights the department's significant role in catering to the medical needs of a
substantial number of patients, irrespective of the specific grade of beds.
Of particular note is the finding that the maximum count of Gyne occupants aligns specifically
with the 2.0 bed grade. This emphasizes a distinct and pronounced demand for accommodation
at this particular level within the Gyne department. The convergence of maximum occupancy
with the 2.0 bed grade suggests that patients seeking services from the Gyne department have a
preference or requirement for this specific category of beds.
The correlation between Gyne department's high occupancy and the 2.0 bed grade suggests
opportunities for service optimization. Administrators could allocate resources or upgrade 2.0
bed grade facilities to meet Gyne department's heightened demand effectively.
This insight informs operational decisions and has strategic implications for hospital planning. It
underscores the importance of tailoring infrastructure and services to meet the specific needs of
the Gyne department's patient population, potentially enhancing patient satisfaction and overall
healthcare outcomes.
In summary, the analysis unveils a compelling connection between the Gyne department's
occupancy patterns and the 2.0 bed grade, prompting a deeper exploration of ways to align
resources with the identified demand. This data-driven approach enhances the hospital's capacity
to deliver patient-centered care and underscores the significance of adapting infrastructure to the
unique requirements of each department within the healthcare facility.
Fig 6: Count Plot to show count of Stay duration for each Admission type
A comprehensive analysis of the data underscores the prevailing dominance of the Trauma
Admission category, revealing it as the primary recipient of occupants compared to Emergency
and Urgent Admissions. This prominence sheds light on the distinct and pronounced demand for
medical attention and care within the Trauma category, signifying its critical role in addressing
severe medical cases.
Further delving into Trauma Admissions unravels an additional layer of complexity, exposing a
substantial concentration of stays within the 21 to 30-day duration range. This pattern suggests
an elevated requirement for extended medical care and attention for patients admitted under the
Trauma category. The prevalence of stays in this duration range implies a necessity for a more
prolonged and comprehensive intervention, underscoring the severity and complexity of
trauma-related cases.
Understanding the temporal aspect of Trauma Admissions, particularly the concentration within
the 21 to 30-day duration, is crucial for healthcare administrators. It provides valuable insights
into the nature of care required for trauma patients, informing resource allocation, staffing
decisions, and the development of specialized protocols to ensure optimal patient outcomes.
Fig 7: Count Plot to show count of Stay duration for Severity of Illness
A comprehensive analysis of the data brings to light a notable trend: the 'Moderate' level of
severity stands out with the highest number of hospital admissions, surpassing both the 'Extreme'
and 'Minor' severity levels. This observation underscores the substantial impact and prevalence
of medical cases falling within the 'Moderate' severity category, indicating the department's
crucial role in managing a diverse range of health conditions.
The analysis of patient stays highlights a concentration in the 21 to 30-day range, notably in
cases categorized as 'Moderate' severity. This underscores the significance of 'Moderate' severity,
indicating a heightened demand for comprehensive and extended medical attention. Such
patients likely present conditions requiring thorough and prolonged interventions, emphasizing
the complexity of cases in the 'Moderate' severity category.
Understanding the distribution of severity levels and the associated duration of stays is
instrumental for healthcare administrators in resource planning and service optimization. The
emphasis on the 'Moderate' severity category not only informs staffing decisions but also guides
the development of tailored medical protocols to ensure that patients in this category receive the
necessary attention and care for an optimal recovery.
In summary, the analysis sheds light on the predominance of 'Moderate' severity cases in terms
of hospital admissions, coupled with a concentration of stays in the 21 to 30-day range.
Fig 8: Count Plot to show count of Stay duration for each Age group
The graphical analysis yields valuable insights, revealing that the 31 to 40 age group stands out
as the most prevalent among individuals admitted to the hospital. This finding emphasizes the
significance of healthcare demands within this specific age bracket, indicating a substantial need
for medical attention and services catering to the health concerns of individuals in their thirties.
Within the prominent 31 to 40 age group, a significant pattern emerges with a notable proportion
experiencing extended stays of 21 to 30 days. This suggests complex health conditions requiring
comprehensive and prolonged interventions.
Understanding the healthcare dynamics of the 31 to 40 age group is vital for hospital
administrators and healthcare providers. It facilitates strategic planning, resource allocation, and
the development of specialized care protocols tailored to the specific needs of this demographic.
In conclusion, the insights derived from the graph not only highlight the predominance of the 31
to 40 age group in hospital admissions but draw’s attention to the imperative for in-depth and
extended medical care within this demographic, guiding healthcare professionals in optimizing
services and ensuring the well-being of patients in this age range.
Fig 9: Count Plot to show count of Age group for various Department
The analysis reveals a significant trend, highlighting the Gyne (Gynecology) department as the
primary attractor of visitors across all age groups. Particularly noteworthy is the robust influx of
visitors aged 31 to 40, exceeding 50,000. This underscores a pronounced demand for
gynecological services in this demographic, emphasizing the need for specialized healthcare
catering to reproductive health concerns.
The Gyne department's prominence across age groups emphasizes its critical role for diverse
demographics. Exceptional visits in the 31 to 40 age group indicate a heightened demand for
gynecological services during this life stage. Recognizing this pattern is crucial for
administrators, guiding strategic planning and resource allocation to meet distinctive healthcare
needs in the thirties.
Moreover, the robust visitor count in the Gyne department across all age groups underscores its
significance as a central component of comprehensive women's health services. This data-driven
insight guides healthcare professionals in tailoring services, allocating resources effectively, and
enhancing overall care quality within the Gynecology department.
In conclusion, the analysis illuminates the Gyne department's universal appeal, with substantial
demand in the 31 to 40 age group, providing a foundation for informed decision-making in
healthcare management to better meet the needs of patients in this demographic.
Fig 10: Hist Plot to show Distribution of Age group in the Dataset
The graphical representation highlights compelling patterns, with the age groups 31 to 40 and 41
to 50 exhibiting the highest rates of hospital visits and stays. This observation underscores the
substantial influx of patients within these specific age brackets, emphasizing the significance of
healthcare needs for individuals in their thirties and forties.
The prominence of hospital visits and stays in these age groups emphasizes the importance of
understanding and addressing health challenges prevalent during these life stages. The data
suggests a heightened demand for medical services and attention within these age ranges,
reflecting a combination of preventive care, chronic condition management, and addressing
health issues common during this phase of adulthood.
Recognizing increased hospital utilization in these age brackets is crucial for healthcare
administrators and providers. It informs resource allocation, staffing decisions, and the
development of targeted healthcare initiatives to meet the specific needs of individuals in their
thirties and forties. Moreover, it underscores the importance of comprehensive and specialized
care tailored to health concerns prevalent in these life stages.
In conclusion, the insights derived from the graph provide a foundation for healthcare
professionals to optimize services and deliver patient-centered care for individuals within the age
groups of 31 to 40 and 41 to 50.
Fig 11: Count Plot to show count of Stay duration for Ward Type in the Hospital
Examining hospital occupancy patterns reveals differences among wards, with R, S, and Q
showing substantial rates, while T and U exhibit minimal occupancy. This variance suggests
distinct utilization levels across hospital units.
Notably, Ward R accommodates the highest number of patients aged 21 to 30, indicating a
pronounced demand for medical services among young adults. Understanding these
demographics is crucial for administrators. For Ward R, recognizing the demand from the 21 to
30 age group enables targeted resource allocation and tailored medical services.
The contrast in occupancy rates emphasizes the need to optimize hospital resources based on
observed utilization patterns. High-demand wards, like R, may require additional attention to
ensure efficient and quality healthcare delivery.
In summary, the analysis identifies variations in occupancy rates among wards, emphasizing
specific demand in Ward R for young adults. These insights inform healthcare management
decisions, enhancing services and providing focused care based on observed occupancy patterns.
2.3 Feature Engineering & Selection
The feature selection and engineering process involve scaling features, and iteratively refining
selections for optimal model performance. Let us break down the steps and concepts involved in
this process:
1. Chi-Squared Test:
The chi-squared test, a statistical tool, gauges the significance of association between two
categorical variables. In feature selection, it evaluates if features are independent of the target
variable, aiding in the identification of influential predictors. This method is particularly valuable
for filtering out less relevant features in the pursuit of constructing more effective predictive
models.
It's important to note that while feature selection techniques like chi-squared can help identify
potentially irrelevant features, the decision to include or exclude features should also consider
domain knowledge, potential multicollinearity, and the overall impact on model performance.
Sometimes, even features with weak associations can contribute to model robustness or capture
nuanced patterns.
CHAPTER 3: Conclusion & Future Scope
The project extensively delves into the implementation of various machine learning algorithms,
specifically focusing on tree-based and ensemble-based techniques, to predict patient stay at a
hospital and at an early stage. The evaluation metrics such as accuracy score, precision score,
recall score, and f1-score were employed to compare the performance of different algorithms.
The study aims to empower healthcare institutions to identify estimated LOS, enabling proactive
measures to be taken before admitting the patients.
Among the algorithms assessed, the Random Forest Classifier also demonstrated strong
performance, particularly in achieving a high accuracy score, positioning it as a viable
alternative for predicting LOS.
The experimentation involved the analysis of a dataset comprising 13,43,256 records, enabling a
robust examination of data patterns and behaviors that contribute to dropout risks. The research
employs various data pre-processing techniques and visualization methods to enhance data
understanding and pattern identification.
The core objective of the project is to establish a model that provides accurate and stable
performance. With an achieved accuracy exceeding 85%, the study sets the foundation for
further improvements. Future efforts could focus on hyperparameter optimization for ensemble
methods, potentially pushing the accuracy beyond 90%.
In conclusion, the project not only presents a comprehensive exploration of machine learning
techniques for stay prediction in an educational context but also outlines potential avenues for
refining and advancing the predictive model, thus contributing to more effective stay prediction
of the patients.
CHAPTER 4: REFERENCES
- Title: "Predictive Modeling for Hospital Length of Stay: A Review of Techniques and
Challenges."
- Year: 2018
- Year: 2019
- Year: 2020
- Year: 2017
5. Data Privacy and Ethical Considerations:
- Author(s): Johnson, R., & Williams, K.
- Year: 2021
- Year: 2019
7. Case Study:
- Author(s): Patel, S., et al.
- Title: "Application of Predictive Analytics for Length of Stay Optimization: A Case Study
in a Tertiary Hospital."
- Year: 2022
- Year: 2018
Appendix:
Data Cleaning