Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 7

By

J. Swathi & D. Divya


GMR Institute Of Technology,
Rajam, Srikakulam District-532 127,
Andhra Pradesh.
E-mail: swathijallepalli@gmail.com

divya1986@gmail.com
ABSTR more applications like Customer
ACT Retention, Marketing, Risk Assessment,
Fraud detection and others.
In this world of exponential Intro
growth of data, accessing the desired
information or the extraction of duction
knowledge from data is called Data
Mining or KDD (Knowledge Discovery
In today’s fiercely
Analysis). KDD has been mostly used
competitive market place, companies
by artificial intelligence and machine
have an insatiable need for information.
learning researchers. This paper analyzes
Customer data, Financial data and
data from different perspectives to find
Internet-click stream data is a powerful
relationships and patterns among dozens
asset provided it can be integrated and
of fields in large relational databases by
utilized to enhance customer
latest trends and methods.
experiences.
The ability to access
Data Warehousing is a
meaningful data, moving and sharing of
repository of data gathered from
data throughout an organization between
multiple sources stored under a unified
departments, officers and business
schema at a single site. In this paper, we
partners in a timely efficient manner
will discuss about the Data Warehouse
through the use of familiar query and
design using star and snowflake
analytical tools is critical.
schemas. We are frequently using Star
DEF: A Database is a collection of non-
schema, it has more advantages over the
redundant data which is sharable
other schemas. Snowflake schemas
between different applications.
normalize dimensions to eliminate
redundancy.

Both Data Mining and Data


Warehousing are important in the
present competitive market world with
What is Data Mining? databases.” is known as spatial Data

Data Mining is defined as Mining.

“the non-trivial extraction of implicit, The applications are useful in remote

previously unknown, potentially useful sensing, medical, navigation, and related

and understandable knowledge from uses.

data”. Data Mining is the process of


finding correlations or patterns among
dozens of fields in large relational Time Series/Sequence Data Mining:

databases. Another important area in

Latest Trends in Technologies Data Mining centers on the mining of


time series and sequence-based data.
and Methods:
This involves the mining of a sequence
There are many number of
of data. Sequential pattern
Data Mining trends is in terms of
mining focuses on the identification of
technologies and methodologies which
sequences.
are currently being developed and
Hypertext and Hypermedia Data
researched. The trends
Mining:
identified includes:
Hypertext and Hypermedia Data
Distributed / Collective Data Mining:
Mining can be characterized as mining
The information located in different
data which includes text, hyperlinks and
places, in different physical locations is
text markups.
generally known as distributed Data
Phenomenal Data Mining:
Mining. Distributed Data Mining
Phenomenal Data mining focuses on
(DDM) is used to offer a different
the relationships between data and the
approach to traditional approaches
phenomenon which are inferred from
analysis, by using a combination of
the data is not went well in data ware
localized data analysis, together with a
project.
“global data model”.
Spatial and Geographic Data Mining:
“The extraction of implicit
knowledge, spatial relationships or other
patterns not explicitly stored in spatial
Applications of Data Mining:- and is grouped under business oriented

Data Mining collects, stores and subject headings such as customers,

organizes data for use in areas such as products, sales analysis report and

• Data Mining and customer marketing campaigns achieved through

relationship management (CRM) data modeling.

software for solving business Integrated:- Data Warehouses must put

decision problems data from disparate sources into a

• Privacy of data in Insurance consistent format. They must resolve

companies and Government problems such as naming conflicts and

agencies inconsistencies among units of measure.


When they achieve this, they are said to
• Fraud detection in
be integrated.
Telecommunications and stock
Non-volatile:- Once loaded into the Data
exchanges
Warehouse , the data is not updated.
• Medical diagnosis to detect
Acts as stable resource for consistent
abnormal patterns
reporting and comparative analysis.
• Airline reservation to maximize
Time-variant:- All data in the Data
seat utilization
Warehouse is time stamped at time of
What Is Data Warehousing?
entry into the warehouse or when it is
A Data Warehouse is a
summarized within the warehouse to act
relational database that is designed for
as chronological record and to provide
query and analysis rather than for
historical and trend analysis possibilities.
transaction processing. It contains
historical data derived from transaction
data.Data Warehouses characteristics,

 Subject Oriented

 Integrated

 Nonvolatile

 Time Variant
Subject Oriented:- The data in the
warehouse is defined in business terms
Architecture of Data Data Warehouse Architecture(with a

Warehouse:- staging area):


Most data warehouses use a
Three common architectures in data
staging area instead. A staging area
warehouse are:
simplifies building summaries and
• Data Warehouse Architecture
general warehouse management.
(Basic)
• Data Warehouse Architecture
(with a Staging Area)
• Data Warehouse Architecture
(with a Staging Area and Data
Marts)
Data Warehouse
Architecture(Basic):

The metadata and raw


data of a traditional online transaction
processing (OLTP) system is present,
as is an additional type of data, summary
data. A summary in Oracle is called a
Data Warehouse Architecture(with a
materialized view.
staging area & Data marts):
We may want to
customize your warehouse's architecture
for different groups within our
organization. We can do this by adding
data marts, which are systems designed
for a particular line of business.
indexes, and synonyms. Commonly used
Schemas are Star schema, Snowflake
schema.
Star Schema: The star schema is the
simplest schema. The entity-relationship
diagram of this schema resembles a star.
The center of the star consists of a large
fact table and the points of the star are
the dimension tables. A Star schema is
characterized by one or more fact tables
and dimension tables.

The main advantages of star schemas are


:
• Provide a direct and intuitive
mapping between the business
entities being analyzed by end
users and the schema design.
Processes within a Data
• Are widely supported by a large
Warehouse:- number of business intelligence
• Extract and load the data tools.A star join is a primary key
• Clean and transform data into a to foreign key join of the
form that can cope with large dimension tables to a fact table.
data volumes and provide good Snowflake Schema:
query performance The Snowflake schema is a
• Backup and archive data more complex data warehouse model
• Manage queries, and direct them than a star schema, and is a type of star
to the appropriate data sources schema. The diagram of the schema
Schemas in Data Warehouse: resembles a snowflake.

A schema is a collection of Snowflake schemas normalize

database objects, including tables, views, dimensions to eliminate redundancy.


Mining tools are continually evolving ,
building ideas from the latest scientific
research. Many of these tools
incorporate the latest algorithms taken
Conclusion:
from AI, Neural networks, Statistics and
In this paper, the concepts
Optimization.
like importance, major trends &
methods of Data Mining as well as
Data Warehouse usually
architecture and design of Data
contains historical data derived from
Warehouse using various schemas
transaction data, but it can include data
involved in effectively managing the
from other sources. The determination of
Data Warehouse are focused.
which schema model should be used for
a Data Warehouse should be based upon
It would not be overly
the requirements and preferences of the
optimistic to say that Data Mining has a
Data Warehouse project team. Star
bright and promising future, and that the
schemas are widely supported by a large
years to come will bring many new
number of business intelligence tools
developments methods, and
where as Snowflake schemas normalize
technologies. The field of Data Mining
dimensions to eliminate redundancy.
is still young enough that the
possibilities are still limitless. Data

You might also like