Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

Data preprocessing is a data mining technique which involves transforming the raw data into an

understandable format of data.

So, for this the data transformation and data reduction have to be done has to be done.

In the data transformation process, the data is transformed into forms appropriate for mining.

Data transformation involves the following:

Aggregation, where summary or aggregation operations are applied to the data. For
example, the daily sales data may be aggregated so as to compute monthly and annual total
amounts. This step is typically used in constructing a data cube for analysis of the data at
multiple granularities.
 Generalization of the data, where low-level or “primitive” (raw) data are replaced by higher-
level concepts through the use of concept hierarchies. For example, categorical attributes,
like street, can be generalized to higher-level concepts, like city or country. Similarly, values
for numerical attributes, like age, may be mapped to higher-level concepts, like youth,
middle-aged, and senior.
 Normalization, where the attribute data are scaled so as to fall within a small specified
range, such as 􀀀1:0 to 1:0, or 0:0 to 1:0.
 Attribute construction (or feature construction), where new attributes are constructed and
added from the given set of attributes to help the mining process.
1. Smoothing: Smoothing is a process of removing noise from the data. It includes Binning,
regression and clustering.
2. Aggregation: Aggregation is a process in which aggregation operations are applied on data.
3. Generalization: In this low-level data are replaced with high-level data using the concept of
hierarchies climbing.
4. Normalization: Normalization scaled attribute data so as to fall within a small specified
range, such as 0.0 to 1.0.
5. Attribute Construction: In the Attribute construction, new attributes are being constructed
from given set.

Data reduction is a process of reducing the amount of capacity required to store data. Data
reduction can increase storage efficiency and reduce costs. Data reduction techniques can be
applied to obtain a reduction representation of A much smaller data, but closely maintains its
original integrity Data. Some of the techniques are:

1. Data cube aggregation, where aggregation operations are applied to the data in the construction
of a data cube.

2. Attribute subset selection, where irrelevant, weakly relevant, or redundant attributes or


dimensions may be detected and removed.

3. Dimensionality reduction, where encoding mechanisms are used to reduce the data set size.

4. Numerosity reduction, where the data are replaced or estimated by alternative, smaller data
representations such as parametric models sampling

5. Discretization and concept hierarchy generation, where data values for attributes are replaced by
ranges or higher conceptual levels
Web mining is the process of using data mining technique used to extract information directly from
web by extracting them from Web documents, content.

Some of the challenges are:

1. Noisy and Incomplete Data: Data are heterogeneous, incomplete and noisy. Data is in large
quantities normally is unreliable. These problems could be due to errors of the instruments
that measure the data or because of human errors.
2. Distributed Data: Huge number of documents are disturbed digital library of web.
3. Complexity of Web pages: The web pages do not have unifying structure. They are very
complex as compared to traditional text document.
4. Diversity of user communities: The user community on the web is rapidly expanding. These
users have different backgrounds, interests, and usage purposes.
5. Web is dynamic source: The information on the web is rapidly updated. The data such as
news, weather is updated.

The web Mining Techniques are:

1. Web Content Mining: Web content mining is the application of extracting useful information
from the content of the web documents. Web content consist of several types of data – text,
image, audio, video etc
2. Web Structure Mining: Web structure mining is the application of discovering structure
information from the web. The structure of the web graph consists of web pages as nodes,
and hyperlinks as edges connecting related pages.
3. Web Usage Mining: Web usage mining is the application of identifying or discovering
interesting usage patterns from large data sets. And these patterns enable you to
understand the user behaviours. \

Biological sequences generally refer to sequences of nucleotides or amino acids. Biological sequence
analysis compares, aligns, indexes, and analyzes biological sequences and thus plays a crucial role in
bioinformatics and modern biology.

Some Methods are:

1. A dot matrix analysis is primarily a method for comparing two sequences to look for possible
alignment of characters between the sequences

2. Dynamic programming in bioinformatics Dynamic programming is widely used in bioinformatics


for the tasks such as sequence alignment, protein folding, RNA structure prediction and protein-DNA
binding

3. Word method is used in the database search tools FASTA and the BLAST family. They identify a
series of short, non-overlapping subsequence’s (words) of the query sequence. Then they are
matched to candidate database sequences to get result.
Multimedia data mining refers to the analysis of large amounts of multimedia information in order
to find patterns or statistical relationships. Once data is collected, computer programs are used to
analyse it and look for meaningful connections.

The categories are:

1. Text mining: Text Mining also referred as text data mining and it is used to find meaningful
information from the unstructured texts that are from various sources. Text Mining is to
evaluate huge amount of usual language text and it detects exact patterns to find useful
information.
2. Image mining: Image mining systems can discover meaningful information or image patterns
from a huge collection of images.
3. Video Mining: Video mining is unsubstantiated to find the interesting patterns from large
amount of video data multimedia data is video data such as text, image, and metadata,
visual and audio.
4. Audio mining: Audio mining plays an important role in multimedia applications, is a
technique by which the content of an audio signal.

You might also like