Professional Documents
Culture Documents
(IJCST-V1I2P7) : T.Shanmugavadivu, T.Ravichandran
(IJCST-V1I2P7) : T.Shanmugavadivu, T.Ravichandran
(IJCST-V1I2P7) : T.Shanmugavadivu, T.Ravichandran
RESEARCH ARTICLE
OPEN ACCESS
ABSTRACT
Classification of patient samples is a crucial aspect of cancer diagnosis and treatment. The present a method for
classifying samples by computational analysis of gene expression data. the classification problem in two parts: class
discovery and class prediction. Class discovery refers to the process of dividing samples into reproducible classes that
have similar behavior or properties, while class prediction places new samples into already known classes. a method for
per-forming class prediction and illustrate its strength by correctly classifying bone marrow and blood samples from
acute leukemia patients. if it describe to use our predictor to validate newly discovered classes.
Keywords:- Gene class, Feature Selection, Feature Prediction.
characterization of over 14 000 full-length cDNAs (FLcDNAs) with 5-UTR sequences of high quality cDNA
I.
INTRODUCTION
and FL-cDNA sequence data provide a valuable resource
Expression profiling experiments often involve
for bioinformatic characterization of features of the 5measuring the relative amount of mRNA expressed in
UTR, coding sequence and 3-UTR sequences that
two or more experimental conditions. This is because
underlie variation in translational regulation. In this
altered levels of a specific sequence of mRNA suggest a
report, a quantitative assessment of the proportion of
changed need for the protein coded for by the mRNA,
mRNA in polysomes for over 11 000 genes was used to
perhaps indicating a homeostatic response or a
evaluate the significance of general mRNA sequence
pathological condition. For example, higher levels of
features on translational regulation under non-stress (NS)
mRNA coding for alcohol dehydrogenase suggest that
and DS conditions in Arabidopsis.
the cells or tissues under study are responding to
III. CLASSIFICATION METHODS
increased levels of ethanol in their environment.
Similarly, if cancer cells express higher levels of mRNA
Classification methods are based on a distance
associated with a particular transmembrane receptor than
function for pairs of tumor mRNA samples, such as the
normal cells do, it might be that this receptor plays a role
Euclidean distance or one minus the correlation of their
in cancer. A drug that interferes with this receptor may
gene expression profiles. Due to proceeds as follows to
prevent or treat cancer. In developing a drug, one may
classify test set observations on the basis of the learning
perform gene expression profiling experiments to help
set. For each tumor sample in the test set (a) find the k
assess the drug's toxicity, perhaps by looking for
closest tumor samples in the learning set, and (b) predict
changing levels in the expression of cytochrome P450
the class by majority vote; that is, choose the class that is
genes, a biomarker of drug metabolism. Gene expression
most common among those k neighbors. The number of
profiling may become an important diagnostic test.
neighbors k is chosen by cross-validation; that is, by
running the classifier on the learning set only. Each
II. SYSTEM MODEL
tumor sample in the learning set is treated in turn as if it
An individual mRNA species in polysomes may
were in the test set; its distance to all of the other learning
provide a means to identify mRNA features that
set tumor samples (except itself) is computed, and it is
contribute to translational regulation. Such an evaluation
classified by the rule. The classification for each learning
would also require knowledge of the full-length sequence
set observation is then compared to the truth to produce
of the mature transcript. There are over 28 000 publicly
the cross validation error rate. This is done for a number
available full coding-region cDNA sequences for
of ks (here k 2 811 31 51 : : : 1 219), and the k for which
Arabidopsis. These cDNAs provide reliable coding and
the cross-validation error rate is smallest is retained for
3-UTR sequence information, but may not begin at the 5
use on the test set.
terminus of the mRNA has allowed for the
ISSN: 2393-9516
www.ijetajournal.org
Page 35
International Journal of Engineering Trends and Applications (IJETA) Volume 1 Issue 2, Sep-Oct 2014
Generation of trees:
Step1: Let n be the number of samples in the training
data S.
Step2: Assign equal weight 1/n to each sample in S.
Step3: For each of k iterations:
Step4: Apply decision tree algorithm to weighted
samples.
Compute error e of the obtained tree on
weighted samples.
If e is equal to zero:
Store the obtained tree.
Terminate generation of trees.
Step5: For each of samples in S:
step6 : If sample is classified correctly by the obtained
tree:
Multiply weight of the sample by e /(1-e).
Normalize weight of all samples.
Classification
Step1: Given a new sample.
Step2: Assign weight of zero to all classes.
Step3: For each of the tree stored:
Add -log(e/(1-e)) to the weight of the class predicted by
the tree.
Return class with highest weight.
ISSN: 2393-9516
VI. CONCLUSION
The problem of class discovery and distinguish
it as a special subclass of the broad category of clustering
problems. We describe how to efficiently compute
statistical significance to how well individual genes
separate tissue classes (for both the T No M and the
INFO methods). Based on these efficient methods, we
propose several criteria for evaluating the statistical
significance of putative sample classifications. The
central idea is to quantify the overabundance of genes
that are informative with respect to any such putative
classification. We then combine these methods with
search heuristics and develop an efficient search
procedure for finding multiple significant classifications
in data sets.
The main criterion we use in searching for new
classifications is the max-surprise score. This score is
appealing both because of its clear definition and because
it can be efficiently evaluated. Our evaluation on
synthetic data shows that searching using the maxsurprise score can recover a true classification under a
wide range of operating parameters including the number
of relevant and irrelevant genes, the amount of variance
www.ijetajournal.org
Page 36
International Journal of Engineering Trends and Applications (IJETA) Volume 1 Issue 2, Sep-Oct 2014
in the expression level, and the difference between the
expressions of genes in two classes.
REFERENCE
[1].
[2].
[3].
[4].
[5].
[6].
[7].
[8].
[9].
[10].
ISSN: 2393-9516
www.ijetajournal.org
Page 37