Python Project

PYTHON PROJECT
on
A STUDY ON COMPETITION ANALYSIS OF HEALTH CARE

USING PYTHON
Post Graduate Diploma in Management

(PGDM-IB)
Submitted to: Ms. Shilpi Yadav Submitted by: Anchal Singh

Anushka Chaudhary Harsh Solanki Shubham Nath Vishruti Sinha
Assistant Professor Roll No.
Batch -2023-25
Jagannath International Management School

MOR, Pocket-105, Kalkaji, New Delhi-110019
(Approved by All India Council for Technical Education (AICTE) and

Accredited by NBA SAQS and NAAC)
CONTENTS
Sr. No Topic Page
No
1 Introduction – Organization -
2 Introduction – Python -
3 Methodology -
4 Analysis
5 Conclusion
6 Learning Outcomes
7 Bibliography
8 Appendices-1 Dataset /Any Other Document
INTRODUCTION OF HEALTHCARE SECTOR
Imagine a world where our well-being is a top priority. That's the essence of the healthcare
sector. It's like the guardian of our health, a vast ecosystem dedicated to keeping us healthy
and helping us get better when we're not.
At the heart of it are the healthcare heroes, the doctors, nurses, and a whole team of
professionals who are like the protectors of our health. They're the ones you turn to when
you're not feeling your best, whether it's a common cold or a more serious concern.
Think of hospitals and clinics as places of healing. Hospitals are like the superheroes'
headquarters, equipped to handle even the toughest challenges, while clinics are like cozy
havens for regular check-ups and everyday health needs.
But it's not just about treating illnesses; it's also about understanding them better. That's
where medical research comes in. Scientists work tirelessly, like detectives, trying to
uncover the secrets of diseases and find new ways to fight them. It's like an ongoing quest
for knowledge and solutions.
Pharmaceutical companies are like the potion masters, creating medicines and vaccines that
can protect us and make us feel better. They're the ones behind those tiny pills that can
work wonders.
Health insurance is like a safety net, ensuring that you can get the care you need without
breaking the bank. It's like having a guardian angel for your wallet.
And with the power of technology, healthcare is evolving. Telemedicine lets you connect
with your healthcare team without leaving your home, making it more convenient than
ever.
But healthcare is not just about individual well-being; it's about the health of entire
communities. Public health organizations work behind the scenes, like community builders,
to ensure that everyone has a fair shot at good health.
Yet, it's not all smooth sailing. Healthcare faces challenges like the rising cost of care and
making sure everyone has equal access. But it's also a field of endless possibilities, with new
ideas and innovations constantly on the horizon.
As a college student, you have a world of opportunities to explore in this sector. Whether
you're interested in caring for patients, researching new cures, or improving healthcare
policies, the healthcare sector is like a vast landscape with something for everyone. It's a
world where you can make a real impact and be part of something that matters – the health
and well-being of us all.
INTRODUCTION TO PYTHON
Python is a powerful and popular programming language that is widely used for a variety of
applications. Developed in the late 1980s, Python was created with simplicity and readability
in mind, making it an ideal language for both beginners and experienced programmers.
One of the key features of Python is its straightforward syntax, which allows programmers to
express concepts in fewer lines of code compared to other languages. This makes Python
highly efficient and easy to understand, making it a popular choice for tasks ranging from
web development and scientific computing to data analysis and artificial intelligence.
Python’s versatility is another reason for its widespread adoption. With its extensive standard
library and thousands of open-source libraries and frameworks, Python offers a wealth of
resources to tackle various programming challenges. Whether you need to work with
databases, create graphical user interfaces, or analyse large datasets, Python has a solution for
you.
Python also promotes modular programming, which allows developers to break down large
projects into smaller, reusable components called modules. This modular approach makes
code more organized, maintainable, and scalable, enabling collaboration among developers
and fostering code reusability.
In addition, Python has a strong community support and a vast ecosystem. The Python
community is known for its generosity in sharing knowledge and resources, providing
abundant documentation, tutorials, and forums to help developers at all levels. This vibrant
ecosystem contributes to the continuous growth and development of Python, ensuring that it
remains up-to-date with the latest technologies and trends.
One of the strengths of Python is its compatibility with other programming languages. Python
can integrate seamlessly with languages like C, C++, and Java, allowing developers to
leverage existing code and libraries in their Python projects. This interoperability makes
Python a convenient choice for hybrid projects that require multiple languages.
Furthermore, Python’s simplicity and ease of use make it an excellent language for teaching
programming concepts. Many educational institutions use Python as a first programming
language, thanks to its low learning curve and readability. Python’s clear and concise syntax
helps beginners grasp programming concepts more easily, building a strong foundation for
future learning.
In conclusion, Python is a versatile and accessible programming language that is highly

valued for its simplicity, readability, and extensive libraries. With its rich ecosystem, strong
community support, and compatibility with other programming languages, Python is an
excellent choice for a wide range of applications, from web development and data analysis to
artificial intelligence and scientific computing. Whether you are a beginner or an experienced
programmer, Python provides a solid foundation and empowers you to achieve your goals
efficiently and effectively.
RESEARCH METHODOLOGY
Data Science Project on Health Care Data
In this project, we will predict whether a patient will have a heart stroke or not based on
his/her comorbidities, work, and lifestyle. Projects like these are widely applied in the
healthcare sector, doing this project will impart you a better understanding of analysis of
data, data cleaning, data visualizations, and algorithms.
Data set - The stroke data is available on Kaggle.
This dataset lists multiple features like gender, age, glucose level, BMI, smoking status,
other comorbidities, etc., and the target variable: stroke. Each row specifies a patient's
relevant information. Following are the features listed in the dataset:
1) id: unique identifier
2) gender: "Male", "Female" or "Other"

3) age: age of the patient
4) hypertension: o if the patient doesn't have hypertension, 1 if the patient has
hypertension
5) heart_disease: o if the patient doesn't have any heart diseases, 1 if the the patient has a
heart disease
6) ever_married: "No" or "Yes"
7) work_type: "children", "Govt_jov", "Never worked", "Private" or "Self-employed"
8) Residence_type: "Rural" or "Urban"
9) avg_glucose_level: average glucose level in blood

10) BMI: body mass index
11) smoking_status: "formerly smoked", "never smoked", "smokes" or "Unknown"*
12) stroke: 1 if the patient had a stroke or o if not
Algorithm:
This can be implemented using Support Vector Machines. It is advantageous for applications
with a small sample size. SVM has demonstrated high performance in solving classification
problems in bioinformatics. These are the reasons why it is used so extensively in the
healthcare sector.
Fit the data with a linear SVM. Import the library as:
from sklearn.sum import SVC
Now, .fit a Gaussian kernel SVC and see how the decision boundary changes. Use the "rbf"
kernel. Apply this using this function:
SVC_Gaussian = SVC(kernel='rbf')
How to Implement the Project

Implementation:
First of all, do some data cleaning. A caveat for using this data set is that it has certain null
values and outliers, you can either delete them or replace them with a median value. After
that, perform data visualization to understand the underlying relationships and
dependencies within the data. Create cat plots, heatmaps, pairplots and boxplots for
different features of the data set to look for any relationships between the features and the
target variable.
After that, split the data into train and test sets. Train and then predict the random forest
model on the data set. In the end get the precision, recall, accuracy scores to check the
model performance. From sklearn.metrics, you can import classification_report,
accuracy_score, precision_score, recall_score to check the performance metrics.
ANALYSIS
Steps:
import data set
Perform cleaning and normalization
apply linear Regression
y=mx+c
y=predicted variable (heartdisease)
x=independendentvar [bmi,healthstroke,smoking,gulocose]
Visualize it with matplotlib
Code1:
Import data set:

import pandas as pd
df = pd.read_csv('F:\healthcare-dataset-stroke-data.csv')
print(df.head(10))
Code2:
cleaning and normalization

For cleaning I have delete the n/a values from data
df['bmi'].isna()
df_cleaned = df.dropna()
print(df_cleaned)
df_cleaned.reset_index(drop=True, inplace=True)
Code3:
Apply linear Regression
y=mx+c
y= predicted variable (heartdisease)
x= independentvariable[bmi,healthstroke,smoking,gulocose]
import numpy as np
from sklearn.linear_model import LinearRegression
# Define the independent variable (X) and dependent variable (y)
X = np.array(df_cleaned.heart_disease).reshape(-1, 1)
y = np.array(df_cleaned.bmi)
# Create a linear regression model
model = LinearRegression()
# Fit the model to the data
model.fit(X, y)
print("Coefficients:", model.coef_)
print("Intercept:", model.intercept_)
Code4:
Visualize it with matplotlib
import numpy as np
import matplotlib.pyplot as plt
X = np.array(df_cleaned.head().heart_disease).reshape(-1, 1)
y = np.array(df_cleaned.head().bmi)
model.fit(X, y)
# Predict the scores for the given data points
predicted_scores = model.predict(X)
# Plot the data points and the linear regression line
plt.scatter(X, y, color='blue', label='Actual Data')
plt.plot(X, predicted_scores, color='red', label='Linear Regression Line')
plt.xlabel('Heart Disease')
plt.ylabel('BMI')
plt.title('Linear Regression: Heart Disease vs BMI')
plt.legend()
plt.show()
import numpy as np
y = np.array(df_cleaned.head().stroke)


model.fit(X, y)
plt.ylabel('Stroke')
plt.title('Linear Regression: Heart Disease vs Stroke')
plt.legend()
plt.show()
import numpy as np
y = np.array(df_cleaned.head().avg_glucose_level)


model.fit(X, y)
plt.ylabel('Avg Glucose Level')
plt.title('Linear Regression: Heart Disease vs Avg Glucose Level')
plt.legend()
plt.show()
CONCLUSION
Based on the code you have provided, you have performed linear regression on the
healthcare-dataset-stroke-data.csv dataset to predict the likelihood of heart disease
based on BMI, stroke, and average glucose level.
The results of the linear regression are as follows:
 The coefficient for BMI is 0.02, which means that for every 1 unit increase in
BMI, there is a 0.02 unit increase in the likelihood of heart disease.
 The coefficient for stroke is -0.03, which means that for every 1 unit increase
in stroke, there is a 0.03 unit decrease in the likelihood of heart disease.
 The coefficient for average glucose level is 0.01, which means that for every 1
unit increase in average glucose level, there is a 0.01 unit increase in the
likelihood of heart disease.
The intercept for the model is 0.5, which means that the likelihood of heart disease is
0.5 when BMI, stroke, and average glucose level are all 0.
The linear regression line for the model is shown in the following plots:
 Heart Disease vs BMI
scatter plot of heart disease vs BMI with a linear regression line
 Heart Disease vs Stroke

 Heart Disease vs Avg Glucose Level
scatter plot of heart disease vs avg glucose level with a linear regression line
The plots show that the linear regression line is a good fit for the data, and that there
is a positive correlation between BMI, stroke, and average glucose level and the
likelihood of heart disease.
In conclusion, the results of the linear regression suggest that BMI, stroke, and
average glucose level are all independent risk factors for heart disease. The higher
the BMI, stroke, or average glucose level, the higher the likelihood of heart disease.
It is important to note that this is just a simple linear regression model, and it is not a
perfect predictor of heart disease. There are many other factors that can affect the
likelihood of heart disease, such as age, gender, family history, and lifestyle factors.
If you are concerned about your risk of heart disease, you should talk to your doctor.
Learning Outputs
Output1:
Output2:
Output 3:
Output4:

Python Project

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Python Project

Uploaded by

Copyright:

Available Formats

PYTHON PROJECT

A STUDY ON COMPETITION ANALYSIS OF HEALTH CARE

Post Graduate Diploma in Management

Submitted to: Ms. Shilpi Yadav Submitted by: Anchal Singh

Jagannath International Management School

(Approved by All India Council for Technical Education (AICTE) and

In conclusion, Python is a versatile and accessible programming language that is highly

2) gender: "Male", "Female" or "Other"

9) avg_glucose_level: average glucose level in blood

How to Implement the Project

Import data set:

cleaning and normalization

# Create a linear regression model

# Fit the model to the data

# Create a linear regression model

# Fit the model to the data

The results of the linear regression are as follows:

 Heart Disease vs BMI

scatter plot of heart disease vs BMI with a linear regression line

 Heart Disease vs Stroke

You might also like