Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 8

Data Mining

techniques
By -
Priyank Yadav
CSE
What is data mining?
• With the enormous amount of data stored in files and repositories for
analysis, interpretation of data and extraction of useful information that
can help in decision making.
• Also called knowledge discovery in databases.
• Data mining is one of the crucial steps in the iterative knowledge
discovery process
Data mining techniques:-
There are number of techniques for data mining. Some of them have
been described as follows:

1) Association:
• Association is probably the most familiar data mining technique
known.
• Here you can make a simple correlation between two or more
items, often of the same type to identify patterns.
For example, when tracking people’s buying habits, you might identify
that a person always buys cream when they buy strawberries, and
therefore suggest that the next time that they buy strawberries they
might also want to buy cream.
2) Classification

• You can classification to build up an idea of the type of customer,


item, or object by describing multiple attributes to identify a
particular class.
For example, you can easily classify cars into different types(sedan,4X4
convertible) by identifying different attributes (number of seats, car
shape, color of the car). Given a new car, you might apply it into a
particular class by comparing the attributes with our known definition.
You can also apply the same principles to customers by classifying them
into age groups, incomes etc.
3) Clustering

• By examining one or more attributes or classes, you can group individuals


pieces of data together to form a structure opinion. At a simple level
clustering is using one or more attributes as your basis for identifying a
cluster of correlating results.
• Clustering is useful to identify different information because it correlates
with other examples so you can see where the similarities and ranges
agree.
Application of data mining techniques in bio-
informatics:
• Bioinformatics, an upcoming field in today’s world which involves use of
large databases can use data mining to derive useful rules.
• Based on the type of knowledge that is mined data mining techniques can
be classified into association rules, classification using decision tress and
clustering. Until recently, biology lacked the tools to analyze the large
repository of data such as the human genome database.
• The data mining techniques are used to extract meaningful relationships
from these data. Data mining is used in microarray analysis which is used
to study the activity of different cells under different conditions.
Two algorithms under each mining techniques may been studied here:
1. Association rule mining-a) apriori b)partition
2. Clustering a) k-means b) k-mediods
3. Classification –decision tree generation using a) gini-index b)entropy value
Genetic algorithms were be applied to association ,classification techniques.
K-means clustering and DBSCAN(density based spatial clustering of
application of noise) were applied to microarray dataset and compared.
Results:
1)For smaller databases, the apriori algorithm works better than partition
algorithm but for larger database partition works better.
2)With respect to the number of interchanges the k-mediods work better
than k-means
3)The results were similar for gini-index and entropy value.
• Bioinformatics involves the manipulation, searching and data mining of
DNA sequence data.
• Evolution of techniques have also helped in other fields like string search
algorithms, machine learning and database theory.

You might also like