Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 20

A

Summer training report


On

Classification of Breast Cancer Histopathological Images

Indian Institute of Technology, Varanasi (bhu),


varanasi
2019

Submitted to: submitted by:


Dr. Sanjay Kumar Singh Amit Kumar Maurya
Professor 3rd year B.Tech(CSE)
CSE Department NIT, Manipur
IIT-BHU

1|Page
DECLARATION

I hereby declare that the project report titled “ Classification of Breast Cancer
Histopathological Images ” submitted by me to Department of Computer Science
and Engineering, Indian Institute of Technology (BHU),Varanasi in complete
fulfilment of the requirement for the award of certificate for Summer Internship
2019 is a record of project work carried out by me under the guidance of Prof Dr.
Sanjay Kumar Singh, Department of Computer Science & Engineering, IIT(BHU). We
further declare that the work reported in this project has not been submitted and
will not be publish or submit in part or full, for the award of any certificate in this
or any other Institute or University.

Place: IIT (BHU), Varanasi Name:


Date: Signature of the Candidate

2|Page
CERTIFICATE

This is to certify that the work contained in this report entitled “Classification of
Breast Cancer Histopathological Images “ submitted by Amit Kumar Maurya
,undergraduate of 3rd Year, B.Tech in Computer Science and Engineering of
National Institute of Technology, Manipur , embodies the result of authentic work
carried out in the Department of Computer Science and Engineering, Indian Institute
of Technology (BHU), Varanasi, is a good work under my supervision.

Place: IIT (BHU), Varanasi Professor Dr. Sanjay Kumar Singh


Date: Department of
Computer Science and Engineering
Indian Institute of Technology (BHU),
Varanasi

3|Page
Acknowledgement

It is always a pleasure to remind the fine people in the IIT-BHU for their sincere
guidance I received to uphold my practical as well as theoretical skills in the area of
Machine Learning and Deep Learning.
First of all, thanks to my parents for giving encouragement, enthusiasm and
invaluable assistance to me. Without all this, I might not be able to complete this
project properly.
I would like to thank Dr. Sanjay Kumar Singh (Professor), Department of Computer
Science and Engineering, Indian Institute of Technology (BHU), Varanasi, for
accepting me as an intern and my mentor Mr. Abhinav Sir (Ph.D.), not only a mentor
but also a wonderful teacher. In this 2-month journey, an everyday meeting with my
mentor, I learnt a lot of new things that are related theoretical and technical
knowledge of this project as well as some important lessons of life, time and flow
management, teamwork and motivation. Thank you so much Sir Abhinav, Sir
Anshul and Ma’am Vandana I am fortunate to have a wonderful mentor like you. I
would also thank the Management of the college and Visual Computing and
Analytics Lab for providing the server and lab facilities for the project. I would also
like to thank my fellow interns and friends for their support and with whose help I
could complete the project in the stipulated time. I would like to thank my parents.

4|Page
Abstract: Histopathology image is an important basis for pathologists to evaluate disease at
the cellular level. Histopathology allows studying the structure and function of cells, tissues, organs and
organ systems. It is the microscopic examination of biological tissues to observe the appearance of
diseased cells and tissues in very fine detail.

Over the past decade, dramatic increases in computational power and improvement in image analysis
algorithms have allowed the development of powerful computer-assisted analytical approaches to
radiological data. As Breast cancer is the most common invasive cancer in females worldwide. Finding
breast cancer in its early can help them to prevent themselves from death of the disease. Earlier the
popular way which is used to diagnose breast cancer is by analyzing H&E stained slide that are
examined under a powered microscope of the affected area. Clearly this process is time consuming
and required a professional for this. In spite of significant advancement in diagnostic image
technology, diagnostic of breast cancer image, including grading and staging, continues being
done by pathologist applying visual inspection of histological samples under the microscope. With
the recent advancement in image processing and machine learning technique allows building
computer aided Detection (CAD) systems that can help pathologist to be more productive and
accurate in diagnosis.

With the advancement of the new technology and a large amount of patient data available has given
a motivation for the development of new techniques to predict and detect the breast cancer.

In this project I has prepared a deep learning model to predict the cancer from histopathological image
from auto-encoder and CNN classifier. I have also tested many methods like auto-encoder+ fuzzy SVM,
auto-encoder+ fuzzy elm, transfer learning, auto-encoder+ simulated annealing, Among all of the
method I tried, I got best result with auto-encoder+ fuzzy SVM.

5|Page
Table Of Contents
Declaration…………………………………………………………………………………………………2
Certification……………………………………………………………………………………………….3
Acknowledgement……………………………………………………………………………………..4
Abstract……………………………………………………………………………………………………..5
1. Introduction ……………………………………………………………………………………………7
1.1 Auto-encoder………………………………………………………………………………8
1.2 Transfer learning…………………………………………………………………………9
1.3 Fuzzy SVM…………………………………………………………………………………..10
1.4 Simulated Annealing……………………………………………………………………11
2. Literature survey …………………………………………………………………………………….12
3. Proposed work ……………………………………………………………………………………….14
4. Description of dataset …………………………………………………………………………….16
5. Results and discussion …………………………………………………………………………….17
6. Conclusion and future work …………………………………………………………………….19
7. References……………………………………………………………………………………………….19
8. Appendix………………………………………………………………………………………………….20

6|Page
1. INTRODUCTION
Deep learning is a growing technology in the field of machine learning and it has
got the attention of many researchers and scientist to work. From the perspective
of vision-based measurement, nuclear medical instruments can not only visualize
the images, but also provide convenient detection and recognition assistance for
pathologists. So, we can train machine to predict cancer from the histopathological
images.
I have extracted features from the images through auto-encoder and used different
classifier to classify the images. I had tried feature extraction from pre-trained
model too. I got best result with auto-encoder and fuzzy SVM as classifier.

1.1 Auto-encoder:
Despite its somewhat initially-sounding cryptic name, autoencoders are a fairly basic
machine learning model. Autoencoders (AE) are a family of neural networks for which
the input is the same as the output. They work by compressing the input into a latent-
space representation and then reconstructing the output from this representation.

Autoencoder architecture

In more terms, autoencoding is a data compression algorithm where the compression


and decompression functions are,

1. Data-specific: Autoencoders are only able to compress data similar to what they
have been trained on. An autoencoder which has been trained on human faces would
not be performing well with images of modern buildings. This improvises the
difference between autoencoders and MP3 kind of compression algorithms which
only hold assumptions about sound in general, but not about specific types of
sounds.

7|Page
2. Lossy: This means that the decompressed outputs will be degraded compared to the
original inputs. Just like what you see in JPEG or MP3.

3. Learned automatically from examples: If you have appropriate training


data, it is easy to train specialized instances of the algorithm that will perform well
on a specific type of input. It doesn’t require any new engineering.

Additionally, in almost all contexts where the term “autoencoder” is used, the
compression and decompression functions are implemented with neural networks.

1.2 Transfer learning:


Transfer learning is a machine learning method where a model developed for a task is
reused as the starting point for a model on a second task.
In transfer learning, we first train a base network on a base dataset and task, and then we
repurpose the learned features, or transfer them, to a second target network to be trained
on a target dataset and task. This process will tend to work if the features are general,
meaning suitable to both base and target tasks, instead of specific to the base task.

In the project I has used VGG-16 pretrained model to extract the features.

VGG Neural Networks. While previous derivatives of AlexNet focused on smaller window
sizes and strides in the first convolutional layer, VGG addresses another very important aspect of
CNNs: depth. Let’s go over the architecture of VGG:

A vgg-16 architecture

8|Page
 Input. VGG takes in a 224x224 pixel RGB image. For the ImageNet competition, the
authors cropped out the center 224x224 patch in each image to keep the input image size
consistent.

 Convolutional Layers. The convolutional layers in VGG use a very small receptive field
(3x3, the smallest possible size that still captures left/right and up/down). There are also 1x1
convolution filters which act as a linear transformation of the input, which is followed by a
ReLU unit. The convolution stride is fixed to 1 pixel so that the spatial resolution is
preserved after convolution.

 Fully-Connected Layers. VGG has three fully-connected layers: the first two have 4096
channels each and the third has 1000 channels, 1 for each class.

 Hidden Layers. All of VGG’s hidden layers use ReLU (a huge innovation from AlexNet
that cut training time). VGG does not generally use Local Response Normalization (LRN), as
LRN increases memory consumption and training time with no particular increase in
accuracy.

1.3 Fuzzy -SVM

Fuzzy logic includes 0 and 1 as extreme cases of truth (or "the state of matters" or "fact") but also
includes the various states of truth in between so that, for example, the result of a comparison
between two things could be not "tall" or "short" but ".38 of tallness."
Fuzzy logic seems closer to the way our brains work. We aggregate data and form a number of
partial truths which we aggregate further into higher truths which in turn, when certain thresholds are
exceeded, cause certain further results such as motor reaction. A similar kind of process is used
in neural networks, expert systems and other artificial intelligence applications. Fuzzy logic is essential

9|Page
to the development of human-like capabilities for AI, sometimes referred to as artificial general
intelligence: the representation of generalized human cognitive abilities in software so that, faced with
an unfamiliar task, the AI system could find a solution.

1.4 Simulating Annealing:

Simulated Annealing (SA) is an effective and general form of optimization. It is useful in finding global
optima in the presence of large numbers of local optima. “Annealing” refers to an analogy with
thermodynamics, specifically with the way that metals cool and anneal. Simulated annealing uses the
objective function of an optimization problem instead of the energy of a material.

Implementation of SA is surprisingly simple. The algorithm is basically hill-climbing except instead of


picking the best move, it picks a random move. If the selected move improves the solution, then it is
always accepted. Otherwise, the algorithm makes the move anyway with some probability less than
1. The probability decreases exponentially with the “badness” of the move, which is the amount deltaE
by which the solution is worsened (i.e., energy is increased.)

Prob(accepting uphill move) ~ 1 - exp(deltaE / kT))

A parameter T is also used to determine this probability. It is analogous to temperature in an annealing


system. At higher values of T, uphill moves are more likely to occur. As T tends to zero, they become
more and more unlikely, until the algorithm behaves more or less like hill-climbing. In a typical SA
optimization, T starts high and is gradually decreased according to an “annealing schedule”. The
parameter k is some constant that relates temperature to energy (in nature it is Boltzmann’s constant.)

Simulated annealing is typically used in discrete, but very large, configuration spaces, such as the set of
possible orders of cities in the Traveling Salesman problem and in VLSI routing. It has a broad range of
application that is still being explored.

Simulated-Annealing()
Create initial solution S
Initialize temperature t
repeat
for i=1 to iteration-length do
Generate a random transition from S to Si
If then

else if then
Reduce temperature t
until (no change in C(S)
Return S

10 | P a g e
2. Literature survey:
Auto-encoder:
In 1998, Chang and Yu (1998) proposed a classification algorithm using
enhanced 1-D correlation to extract the features of halftone image and
classifying these features by back-propagation (BP) neural network. Their
method classified only four types of halftone images produced by clustered-dot
ordered dithering, dispersed-dot ordered dithering, constrained average, and
error diffusion. Kong et al. (2011) used enhanced 1-D correlation, gray level co-
occurrence matrix and gray run-length matrix to extract the periodic and
texture features of halftone image and then employed multistage decision to
classify halftone images into nine categories. Liu et al. (2011) separated halftone
images into nine classes according to extracted features in Fourierspectrum by
using naive Bayes classifier. Wen et al. (2014) sought the optimum class feature
matrices by minimizing the total square error according to the characteristic of
error diffusion filters and then used maximum likelihood to classify six types of
ED halftone images. However, the features of halftone images extracted by
different methods are only adapted to a certain class of halftone images, such
that the method proposed by Wen et al. (2014) is suitable for error diffusion. To
date there is no a general method to extract halftone image features from
different classes, as well as the feature extraction from the same class but with
different kernels or templates. The difficulty lies in non-unified distribution of
dot features presented among various classes of halftone images and almost
inconspicuous differences shown in halftone images produced the same class
halftoning with different halftoning patterns or diffusion filters. Thus, for real
application in inverse halftoning techniques, it is necessary to develop a general
classification mechanism for various halftone images.
VGG-16:

VGGNet is invented by VGG (Visual Geometry Group) from University of Oxford,


Though VGGNet is the 1st runner-up, not the winner of the ILSVRC (ImageNet
Large Scale Visual Recognition Competition) 2014 in the classification task,
which has significantly improvement over ZFNet (The winner in 2013) [2] and AlexNet
(The winner in 2012) [3]. And GoogLeNet is the winner of ILSVLC 2014, I will also talk
about it later.) Nevertheless, VGGNet beats the GoogLeNet and won the
localization task in ILSVRC 2014.

11 | P a g e
And it is the first year that there are deep learning models obtaining the error
rate under 10%. The most important is that there are many other models built
on top of VGGNet or based on the 3×3 conv idea of VGGNet for other purposes
or other domains.

Fuzzy – SVM:
On the basis of the theory of classical SVM, Lin proposed the theory of fuzzy
support vector machine in Ref. Chun. In classical SVM, each sample is treated
equally; i.e., each input point is fully assigned to one of the two classes.
However, in many applications, some input points, such as the outliers, may not
be exactly assigned to one of these two classes, and each point does not have the
same meaning to the decision surface. To solve this problem, fuzzy membership
to each input point of SVM can be introduced, such that different input points
can make different contribution to the construction of decision surface Ref.
Chun. Suppose the training samples are
S={(Xi,yi,si),i=1,…,N}.S={(Xi,yi,si),i=1,…,N}.
(4)
where each X i ∈R N is a training sample and y i ∈{−1, +1} represents its class
label, s i(i=1,2,...,N) is a fuzzy membership which satisfies σ ≤ s i ≤ 1 with a
sufficiently small constant σ>0. Denote a set as Q={X i | (X i , y i , s i )∈S}; clearly,
it contains two classes. One class contains such sample point X i with y i =1,
denoting this class by C +, then,
C+={Xi|Xi∈Sandyi=1}.C+={Xi|Xi∈Sandyi=1}.
The other class contains such sample point X i with y i =−1; denoting this class
by C −, then
C−={Xi|Xi∈Sandyi=−1}.C−={Xi|Xi∈Sandyi=−1}.
It is clearly Q=C + ∪ C −.
The quadratic problem for classification, then, can be described as follows:
minyi(wTΦ(Xi)+b)≥1−ξi,ξi≥0,i=1,…,N12∥w∥2+C∑i=1msiξii=1,…,N{min12‖w‖2+C∑i
=1msiξiyi(wTΦ(Xi)+b)≥1−ξi,i=1,…,Nξi≥0,i=1,…,N
(5)
where C is a constant. Since the fuzzy membership s i is the attitude of the
corresponding point X i toward one class and the parameter ξ i is a measure of
error in the SVM, the term s i ξ i can be looked as a measure of error with

12 | P a g e
different weights. It is noted that a smaller s i can reduce the effect of the
parameter ξ i in problem (5) so that the corresponding point X i can be treated as
less important. Solving problem (5) is as same as that for classical SVM just
with a little difference Ref. Chun. Generally speaking, the quadratic programs
can be solved by their dual problems [11].
Choosing appropriate fuzzy memberships for a given problem is very important
for FSVM. In Ref. Chun, the fuzzy membership function for reducing the effect
of outliers is a function of the distance between each data point and its
corresponding class center, and the function is represented with parameters of
the input space. Given the sequence of training points (4), denote the mean of
class C + and class C − as X + and X −, respectively. The radius of class C + is
r+=max ∥X+−Xi∥ where Xi∈ C +, r + =max ‖X+−Xi‖ where Xi ∈ C+,
(6)
and the radius of class C − is
r−=max ∥X−−Xi∥ where Xi ∈ C−. r−= max ‖X−−Xi‖ where Xi ∈ C−.
(7)
The fuzzy membership s i is [Ref. Chun]
si={1−∥X+−Xi∥/(r++δ)1−∥X−−Xi∥/(r−+δ)ifXi∈C+ifXi∈C−si={1−‖X+−Xi‖/(r++δ)ifXi
∈C+1−‖X−−Xi‖/(r−+δ)ifXi∈C−
(8)
where δ>0 is a constant to avoid the case s i =0.

The FSVM with the above membership function can achieve good performance
since it is an average algorithm. A particular sample in the training set only
contributes little to the final result and the effect of outliers can be eliminated
by taking average on the samples [Ref. Chun, 7].

13 | P a g e
3.Proposed Work:
Transfer- learning:
In this I had used VGG-16 model as feature extractor and a CNN – classifier for classification.
CNN classifier consist of 5 convolution layer, 2 dense layers and some dropouts. I train this
model upto 100 epochs
Optimizer: Adam; Learning rate = 0.00001; filter size: 3*3
Loss = categorical crossentropy

Auto-encoder:
Architecture:
Algorithm 1 : The training procedure for our DMAE
Input: The training set: X = {x(1), x(2), . . . , x(n)};
learning rate: α; hyper parameter: λ; sparsity parameter: β;
parameter for the JM: ξ;and iterative number: It.
Output: The weights and biases: n W(l) , b (l) oL l=1 .
//Initialization
1: Initialize n W(l) , b (l) oL l=1 according to Equation (4).
2: Set h (0)(i) = x(i), i ∈ [1, n]. //Greedy layer-wise approach pre-training DMAE
3: for each l ∈ [1, L] do
4: Set x (l) (i) = h (l−1)(i).
5: for each t ∈ [1, It] do
6: Do forward propagation to calculate y (l) (i) according to Equation (1) and Equation (2).
7: Solve the optimization problem in Equation (3) to compute W(l) and b (l) .
8: end for
9: Do forward propagation to obtain the representation h (l) (i) for each x (l) (i).
10: end for
11: return {W(l) , b (l)} L l=1.

Auto-encoder architecture

14 | P a g e
My model has 3 CNN layers and 2 maxpooling layers. With two batch normalization layers.
I trained my model through this architecture for around 150 epoch and with the help encoder
part I extracts the feature of the images and saved that.
After using that features I do binary classification using the following classifier.
1. CNN classifier: I fed up the extracted feature in a CNN model consist of 5 convolution layer, 2
dense layers and some dropouts. I train this model on upto 100 epochs
Optimizer: Adam; Learning rate = 0.00001; filter size: 3*3
Loss = categorical crossentropy
2. Fuzzy SVM: I fuzzify the features and used SVM classifier I had the best result using that.
3. Simulated annealing: I fed up the features in the proposed algorithm and record the result.
T=10.0, T_min=0.001, alpha=0.8, max_iter=0.25, n_trans=5, max_runtime=300,
cv=3, scoring='f1_macro'
I had executed all the above methods on that Tesla P100-PCIE-16GB was provided by Kaggle,
has 5GB of gpu.

15 | P a g e
4. Description of dataset:
The Breast Cancer Histopathological Image Classification (BreakHis) is composed of 9,109 microscopic
images of breast tumor tissue collected from 82 patients using different magnifying factors (40X,
100X, 200X, and 400X). To date, it contains 2,480 benign and 5,429 malignant samples (700X460
pixels, 3-channel RGB, 8-bit depth in each channel, PNG format). This database has been built in
collaboration with the P&D Laboratory – Pathological Anatomy and Cytopathology, Parana, Brazil
(http://www.prevencaoediagnose.com.br). We believe that researchers will find this database a useful
tool since it makes future benchmarking and evaluation possible
The BreaKHis 1.0 is structured as follows:

Magnification Benign Malignant Total

40X 652 1,370 1,995

100X 644 1,437 2,081

200X 623 1,390 2,013

400X 588 1,232 1,820

Total of Images 2,480 5,429 7,909

And these images are classified in trainings set testing set in different fold. There are 5 such folds. And
I have tested my model on every fold of dataset.

5. Results and discussion:


Transfer learning:
16 | P a g e
First I have used transfer learning here. I have extracted feature from the histopathological
image with the help of VGG-16 (a pre-trained model). After that I have used CNN classifier.
I have got the following plot of training and testing accuracy.

40X (fold3) 100X(fold3)

200X (fold3) 400X(fold3)


Transfer Learning (VGG-16):
Magnifiac
tion Accuracy
40X 86.3
100X 84.15
200x 93.32
400X 81.41

Accuracy on fold3
Auto-encoder:
While training with proposed architecture of auto-encoder I have got the following plot and result

40X (fold3) 100X (fold3)

17 | P a g e
200X (fold3) 400X(fold3)
Plot for training and testing loss. As both are overlapping concludes that my model is not
overfitting.

Auto-encoder with CNN classifier:


Optimizer: Adam; Learning rate = 0.00001, batch size = 32; Loss = categorical_crossentropy

Magnification Fold1 Fold2 Fold3 Fold4 Fold5 Average


40X 77.71 73.1 87.06 81 85.54 80.882
100X 75.26 74.83 85.35 78.51 81.72 79.134
200x 81.85 80.06 93.77 84.53 85.06 85.054
400X 82.74 78.7 82.09 78.42 84.93 81.316

After using fuzzy SVM as classifier


Feature extracted from auto-encoder applied and accuracy is noted.
Accuracy

Magnification Fold1 Fold2 Fold3 Fold4 Fold5 Average


40x 77.91 80.05 87.65 82.79 84.06 82.492
100x 75.57 77.98 88.57 82.39 84.21 81.744
200x 81.31 85.75 97.72 88.77 88.58 88.426
400x 78.62 81.22 88.51 83.93 85.23 83.502

Simulating annealing on fold 3


Feature extracted from auto-encoder is clustered with simulated annealing and classified with SVM

Magnification Accuracy
40X 77.92
100X 74.58
200x 90.59
400X 79.05

6.Conclusion and future work:

18 | P a g e
I had tried and applied different methods of deep learning in order to require a decent
accuracy. But in these types of classifications only accuracy can’t be taken into the account. We
have to take precision and loss to under consideration.
Over all I have my best result with fuzzy SVM on binary classification.
In future I want to improve my model such it works on multi classification efficiently. For these I
have to pre-processing of the images effectively. And have to build model accordingly.

7.References :
1. Deep Manifold Preserving Autoencoder for Classifying Breast Cancer Histopathological Images
Yangqin Feng, Lei Zhang, Senior Member, IEEE, Juan Mo
(https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8417906)

2. A Gentle Introduction to Transfer Learning for Deep Learning


(https://machinelearningmastery.com/transfer-learning-for-deep-learning/)

3.Kaggle ( https://www.kaggle.com/)

4.Autoencoder as classifier (Tutorial) (datacamp.com)

5.BreakHis Dataset (https://web.inf.ufpr.br/vri/databases/breast-cancer-histopathological-database-


breakhis/)

6. Hybrid approach using fuzzy sets and extreme learning machine for classifying clinical
datasets (Sciencedirect.com)
7.VGG neural network (towardsdatascience.com)

8. AMOSA (https://www.isical.ac.in/~sriparna_r/ieeetec.pdf)

9.Simulated annealing ( http://www.cs.cmu.edu)

10. Youtube Tutorials (www.youtube.com)

11. Google scholar (scholar.google.co.in)

12. Github (github.com)

13. Wikipedia (Wikipedia.com)

8. Appendix
Breast Cancer Further Study(https://www.cancer.net/cancer-types/breast-cancer/statistics)

19 | P a g e
More women are diagnosed with breast cancer than any other cancer, besides skin
cancer. This year, an estimated 268,600 women in the United States will be diagnosed
with invasive breast cancer, and 62,930 women will be diagnosed with in situ breast
cancer. An estimated 2,670 men in the United States will be diagnosed with breast
cancer.
It is estimated that 42,260 deaths (41,760 women and 500 men) from breast cancer will
occur this year.
The 5-year survival rate tells you what percent of people live at least 5 years after the
cancer is found. Percent means how many out of 100. The average 5-year survival rate
for women with invasive breast cancer is 90%. The average 10-year survival rate is
83%.
If the cancer is located only in the breast, the 5-year survival rate of women with breast
cancer is 99%. Sixty-two percent (62%) of cases are diagnosed at this stage. If the
cancer has spread to the regional lymph nodes, the 5-year survival rate is 85%. If the
cancer has spread to a distant part of the body, the 5-year survival rate is 27%.
About 6% of women have metastatic cancer when they are first diagnosed with breast
cancer. Even if the cancer is found at a more advanced stage, new treatments help
many people with breast cancer maintain a good quality of life, at least for some time.
It is important to note that these statistics are averages, and each person’s risk depends
on many factors, including the size of the tumor, the number of lymph nodes that
contain cancer, and other features of the tumor that affect how quickly a tumor will grow
and how well treatment works. This means that it can be difficult to estimate each
person's chance of survival.

Breast cancer is the second most common cause of death from cancer in women in the
United States, after lung cancer. However, since 1989, the number of women who have
died of breast cancer has steadily decreased thanks to early detection and treatment
improvements.

Currently, there are more than 3 million women who have been diagnosed with breast
cancer in the United States.

It is important to remember that statistics on the survival rates for people with breast
cancer are an estimate. The estimate comes from annual data based on the number of
people with this cancer in the United States. Also, experts measure the survival
statistics every 5 years. So the estimate may not show the results of better diagnosis or
treatment available for less than 5 years.

20 | P a g e

You might also like