DSML PROJECT REPORt Harshit

A Progress Report
on
Breast Cancer
carried out as part of the course Submitted by
Name of student Harshit Sharma

Roll no 209301572
VI-CSE
in partial fulfilment for the award of the degree
of
BACHELOR OF TECHNOLOGY
In
Computer Science & Engineering

1. Introduction of problem –
Breast cancer is a significant global health concern, affecting millions of people

worldwide. Medical imaging techniques, such as mammography, are commonly used for the
early detection and diagnosis of breast cancer. With advancements in machine learning (ML)
and artificial intelligence (AI), breast cancer ML projects have emerged as a promising approach
to improve the accuracy and efficiency of breast cancer detection, diagnosis, and treatment.
Breast cancer ML projects typically involve training ML algorithms on large datasets of breast
imaging data, such as mammograms, MRI scans, and ultrasound images, along with associated
clinical and patient data. These ML algorithms can then be used to analyze new breast imaging
data and provide insights and predictions related to breast cancer detection, risk assessment,
tumor classification, treatment planning, and patient outcomes.
The goals of breast cancer ML projects include improving the accuracy and speed of breast
cancer diagnosis, reducing false positives and false negatives, identifying high-risk patients who
may benefit from targeted interventions, and facilitating personalized treatment plans. Breast
cancer ML projects also have the potential to enhance the efficiency of radiologists and other
healthcare professionals by providing automated and data-driven decision support tools.
Breast cancer ML projects often involve a multidisciplinary approach, bringing together

expertise in machine learning, medical imaging, oncology, and clinical research. These projects
may be conducted in collaboration with healthcare institutions, research organizations, and
industry partners, and may involve the use of diverse ML techniques, such as deep learning,
feature extraction, and pattern recognition, to analyze complex breast imaging data.
The ultimate goal of breast cancer ML projects is to improve breast cancer outcomes, including
early detection, accurate diagnosis, optimal treatment planning, and better patient care.
However, it's important to note that breast cancer ML projects are still in the research and
development stage, and their use in clinical practice may be subject to regulatory approval and
validation through rigorous clinical trials. Nevertheless, breast cancer ML projects hold great
promise for revolutionizing breast cancer care and making a positive impact on the lives of
patients affected by this disease.
3 Dataset description
The breast cancer database is a comprehensive collection of structured and organized data
related to breast cancer, including clinical, imaging, genetic, and demographic information. It
serves as a valuable resource for researchers, clinicians, and healthcare professionals working in
the field of breast cancer research, diagnosis, treatment, and prevention.
The breast cancer database typically includes various types of data, such as:
Clinical Data: This includes patient demographics (e.g., age, gender, race), medical history (e.g.,
family history of breast cancer, previous breast biopsies, hormonal status), and clinical findings
(e.g., breast tumor characteristics, lymph node involvement, stage of breast cancer).
Imaging Data: This includes imaging studies, such as mammograms, MRI scans, ultrasound
images, and other radiological findings related to breast cancer. Imaging data may include
information on tumor size, location, shape, and other characteristics that are important for
breast cancer diagnosis and staging.
Genetic Data: This includes genetic information, such as gene mutations (e.g., BRCA1, BRCA2),
gene expression profiles, and other genetic markers associated with breast cancer risk,
prognosis, and treatment response.
Treatment Data: This includes information on the type of treatment received by breast cancer
patients, such as surgery (e.g., lumpectomy, mastectomy), radiation therapy, chemotherapy,
hormonal therapy, and targeted therapies. Treatment data may also include details on
treatment outcomes, adverse effects, and follow-up care.
Follow-up Data: This includes information on patient follow-up visits, disease progression,
survival rates, and long-term outcomes of breast cancer patients.
Research Data: This includes data generated from breast cancer research studies, such as
clinical trials, epidemiological studies, and translational research, which may provide valuable
insights into breast cancer risk factors, treatment efficacy, and outcomes.
Algorithm used
1. Data Visualization (any two)
Breast cancer is a complex disease that requires thorough analysis and understanding for
effective diagnosis and treatment. Data visualization techniques play a crucial role in simplifying
complex information and providing meaningful insights from breast cancer data. Two
commonly used data visualization techniques in breast cancer research and analysis are the
confusion matrix and the ROC curve.
2. Result analysis on the basis of:

3. a) Confusion matrix
b) ROC
Confusion Matrix:
A confusion matrix is a tabular representation that shows the predicted and actual outcomes of
a classification model. It is commonly used to evaluate the performance of a machine learning
model in breast cancer research or analysis. The confusion matrix typically consists of four
values: True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN).
True Positive (TP) represents the number of instances that are actually positive and were
predicted as positive by the model. True Negative (TN) represents the number of instances that
are actually negative and were predicted as negative by the model. False Positive (FP)
represents the number of instances that are actually negative but were predicted as positive by
the model, also known as a Type I error. False Negative (FN) represents the number of
instances that are actually positive but were predicted as negative by the model, also known as
a Type II error.
The values in the confusion matrix can be used to calculate various evaluation parameters, such
as accuracy, sensitivity (also called recall), specificity, and F1 score, which provide insights into
the performance of the model.
ROC Curve:
An ROC curve is a graphical representation of the performance of a binary classification model.
It displays the trade-off between true positive rate (sensitivity) and false positive rate (1 -
specificity) for different classification thresholds. The ROC curve is plotted by calculating
sensitivity and specificity at different classification thresholds and plotting them on a graph.
Sensitivity (also called recall) is the proportion of actual positive cases that are correctly
identified by the model. Specificity is the proportion of actual negative cases that are correctly
identified by the model. The ROC curve helps in visualizing the performance of the model in
terms of its ability to correctly identify both positive and negative cases at different
classification thresholds.
• Evaluation Parameter for three train test ratio 70:30,

80:20, 60:40 table for evaluation parameter

DSML PROJECT REPORt Harshit

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DSML PROJECT REPORt Harshit

Uploaded by

Copyright:

Available Formats

A Progress Report

Name of student Harshit Sharma

in partial fulfilment for the award of the degree

Computer Science & Engineering

Breast cancer is a significant global health concern, affecting millions of people

Breast cancer ML projects often involve a multidisciplinary approach, bringing together

2. Result analysis on the basis of:

• Evaluation Parameter for three train test ratio 70:30,

You might also like