Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3


Introduction –

Biotechnology has been constantly re-inventing itself over the past three decades, as
a result of the new advanced in technique, reagents, instrumentation, molecularly
biology software, biological data mining and the viable commercial implication of this
discipline and allied sciences. Vast amount of biology data is available today which is
being burdened constantly each day with the addition of more and more data .

Why is it called data mining ?

Mining is a process for sifting through lots of ore in order to beget purified metal.
Similarly data mining is a process which involves browsing a vast amount of data to
extract real information which can be useful to decision makers. Similarly to gold mining
or mining for any precious metal, data itself is very critical for the data mining.
Importance of quality data in data mining cannot be overloaded since the quality of data
mining cannot be overlooked since the quality of data determines the outcome of the
data mining.

Approaches for developing data mining application :-

 Influencing – based data mining

The large database containing complex and granular data are scanned for
influencing between specific data sets. The scanning of such complex data sets.
the scanning of such complex data is done along many dimension and in multi-
table formats . Influencing based data mining system are useful in situation
wherever there are significant cause and effect relationship between data sets.
 Affinity - based data mining
Affinity based data mining is a variant of influence – based data mining, large and
complex data sets are analyzed across multiple dimensions and the data mining
system identifies data points or sets that tend to be grouped these system
differentiate themselves by providing hierarchies of associations and showing
any underlying logical conditions or rules that accounts for the specific groupings
of data .
 Time – delay data mining
In time delay data mining, the data set is collected over a period of time and
hence is not compete. The set is also not available immediately. The system
designed to handle such data looks for patterns that are confirmed or rejected as
the data set increase and become more robust. The time delay data mining is
suitable for the analysis of long term clinical trails and studies on the multi-
component modes of action.
 Trends – based data mining
In trend – based data mining the software system analyses large and complex
data sets in respect of any change that occur in specific data sets over a period
of time. The data sets can be user-defined, or the system can itself uncover
them. Essential, the system reports on anything that is changing over time. This
is especially important in cause–and-effect biological experiments.
 Comparative data mining
Large and complex data sets, akin to each other, are compared in comparative
data mining exercises. Comparative data mining is specially valuable in meta-
analyses of almost all the clinical trial. In clinical trials, meta-analysis includes
comparison of data, which is collected at different time periods and under similar
conditions. Finding dissimilarities, rather than similarities, is the emphasis in
comparative data mining.
 Predictive data mining
The first five data mining approaches discussed above do not offer a framework
for making simulation, prediction and forecasts, based on the data sets they
might analyse. Predictive data mining combines pattern matching, influencing
relationship, time set correlations, and dissimilarity analysis to offer simulations of
future data sets. The predictive data mining system are capable of incorporating
entire data sets into their working, and not just samples, which make their
accuracy significantly higher. Predictive data mining is used in clinical trial
analysis and in establishing structure-function correlations.

Types of data mining analysis :-

 Transaction data
In business organization, the transaction data is very important as it tells about
the customer profile, their action and, in turn, their behavior. “past behaviors the
best prediction of future behavior” is the essence of customer behavior prediction
and phycologists have established the same over a period of time .
 Purchased data
A subset of useful information is the outcome of an interactive discussion with an
organization or an individual. Nowadays, a cobweb of industries exits which
provides very useful supplement data about the intended subject. The
laboratories involved in genome projects, sequencing, R&D laboratories of
biological and diagnostic and molecular drugs screening and development, etc.
are the company which provide resource data for data mining purpose in
respective field of biotechnology. Such data can be out rightly purchased or
mutually agreeable contractual arrangement can be made for sharing the data.

You might also like