Data Mining

You might also like

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 12

DATA MINING

can be seen as materialized views


generated from the underlying multiple
data sources. Materialized views are
used to speed up query processing on
ABSTRACT large amounts of data. It separates
analysis workload from transaction
Data mining has
workload and enables an organization to
become a popular buzzword but, in fact,
consolidate data from several sources.
promises to revolutionize commercial
These views need to be maintained in
and scientific exploration. Databases
response to updates in the source data.
range from millions to trillions of bytes
This is often done using incremental
of data. Data mining, the extraction of
techniques that access data from
hidden predictive information from
underlying sources. In the data-
large databases, is a powerful new
warehousing scenario, accessing base
technology with great potential to help
relations can be difficult; sometimes data
companies focus on the most important
sources may be unavailable, since these
information in their data warehouses.
relations are distributed across different
The automated, prospective analyses
sources.
offered by data mining move beyond the
analyses of past events provided by
This paper
retrospective tools typical of decision provides an introduction to the basic
technologies of data mining. As well as a
support systems. Data mining tools can
basic description of how data warehouse
answer business questions that architectures can evolve to deliver the
value of data mining to end-users.
traditionally were too time consuming to
resolve. According
to my view to install a new
A data information source by
warehouse is a relational database that is DATAMINING &
WAREHOUSING in order to
designed for query and analysis rather
serve the people is introduced as
than transaction processing. It usually follows.
contains historical data that is derived
from transaction Data in the warehouse
 Text Mining
Process
TOPICS THAT COVERED IN Management Layer
DATAMINING AND  Stacked Generalization

WAREHOUSING Application
Messaging Layer
 Data Preparation
DATA MINING Data Warehouse
DATA WAREHOUSING (Physical) Layer

 Overview
Overview
 Deployment
 Data, Information, and
Data Staging
Knowledge
Layer
Data warehousing working
 How does data mining work?
Benefits  Drill-Down Analysis
 Data Warehouse
 Crucial Concepts in Data Mining Options
Information Access
Layer
 Bagging
 Feature Selection
Data Access
Data Warehouse
Layer
Scope
 Boosting
Data Directory
(Metadata) Layer
 Machine Learning
Data Marts
 Different levels of analysis
Advantages
 Models for Data Mining
Disadvantages
 The Foundations of Data Mining
Application
 The Scope of Data Mining
Conclusion
 Commonly used techniques
 Architecture
 Profitable Applications
 Advantages
 Disadvantages
 Conclusion Gold Mine Coal
Mine Various Minnie’s

DATA MINING
COMPARED WITH REAL
TIME MINING
DATA MINING
COMPARED WITH REAL Extractors

TIME MINING
Overview

Data mining is the process


of analyzing data from different
perspectives and summarizing it into
useful information - information that can
be used to increase revenue, cuts costs,
or both. It allows users to analyze data
from many different dimensions or
angles, categorize it, and summarize the
Purifier’s
Destination relationships identified.

Data mining architecture:

Graphical user
interface
Pattern
evaluation
Data mining
engine
Utilization Database or data
Required shape warehouse server
Filtering

Data
Databases Warehouse

Differe
nt shapes of source

Data Mining
How does data mining work? Interpret Results

 Classes: Stored data is used to


locate data in predetermined
groups.
 Clusters: Data items are grouped
according to logical relationships Data mining consists of five
or consumer preferences. major elements:
 Associations: Data can be
mined to identify associations.  Extract, transform, and load
transaction data onto the data
 Sequential patterns: Data is
warehouse system.
mined to anticipate behavior
patterns and trends.  Store and manage the data in a
multidimensional database
The Data Mining Process: system.
 Provide data access to business
analysts and information
Data sources Data warehouses
technology professionals.
 Analyze the data by application
software.
Preprocess data: collecting & cleansing
 Present the data in a useful
format, such as a graph or table.

Search for Patterns Different levels of analysis are


available:
Revise/
Analyst reviews output Refine
 Artificial neural networks:
Queries
Non-linear
Report Findings predictive models that learn through
training and resemble biological
neural networks in structure.

Take action based on findings


 Genetic algorithms:  Ability to manage lots of model
scores over time
Optimization  Ability to track model score
techniques that use processes such as changes over time
genetic combination, mutation, and  Ability to reconstruct a customer
natural selection in a design based on “signature” on demand
the concepts of natural evolution.  Ability to publish scores, rules,
and other data mining results
What technological
The Foundations of Data
infrastructure is required?
Mining:

 Size of the database: The more


 Massive data collection
data being processed and
 Powerful multiprocessor
maintained, the more powerful
computers
the system required.
 Data mining algorithms
 Query complexity: The more
complex the queries and the
The Scope of Data Mining:
greater the number of queries
 Automated prediction of trends
being processed, the more
and behaviors
powerful the system required.
 Automated discovery of
Data Mining Infrastructure: previously unknown patterns

The most commonly used


 Ability to access data from many
techniques in data mining are:
sources & consolidates
 Artificial neural networks
 Ability to score customers based
 Decision trees:
on existing models
 Genetic algorithms:
 Ability to manage lots of models
 Nearest neighbor method:
over time
 Rule induction:
 David Nelson

 High transaction throughput


Profitable Applications:

A wide range of companies  Decision makers require access

has deployed successful applications of to all data

data mining. While early adopters of this


 A data warehouse is a subject-
technology have tended to be in
oriented, integrated, time-variant
information-intensive industries such as
and non-volatile collection of
financial services and direct mail
data in support of management’s
marketing, the technology is applicable
decision-making process
to any company looking to leverage a
large data warehouse to better manage
DATAWAREHOUSING
their customer relationships. Two critical
factors for success with data mining are:
ARCHITECTURE:
a large, well-integrated data warehouse
and a well-defined understanding of the
business process within which data
mining is to be applied (such as OLT
customer prospecting, retention, P
Metadata
campaign management, and so on). Extract
Externa Integrate
Data l Transfor Data
m Warehouse
Warehousing Maintain
Legac
y
Data Warehousing:

 OLTP (online transaction


processing) systems
Operational Environment Ana
 Range in size from megabytes to
terabytes
 Outflow - making data available
to end users
 Metaflow - managing the
metadata
Benefits:
Problems of Data
 Potential high returns on
Warehousing
investment
 Hidden problems with source
 Competitive advantage
systems
 Data can reveal previously
unknown, unavailable and  Required data not captured
untapped information
 Increased end-user demands
 Increased productivity of
corporate decision-makers
Data Marts:
 Integration allows more
substantive, accurate and  A subset of a data warehouse that
consistent analysis five primary supports the requirements of a
uses particular department or business
function
Information Flow Processes:
 As data warehouse grows larger,
 Inflow - extraction, cleansing and ability to serve needs may be
loading of data from source compromised
systems into warehouse
Extraction, cleansing &
 Upflow - adding value to data in
warehouse through summarizing, Transformation Tools
packaging and distributing data
 Code generators
 Downflow - archiving and
 Database data replication tools
backing up data in warehouse
Data Warehouse DBMS: Dimensionality

 Load performance & Processing Modelling:

 Data Quality management  Similar to E-R modelling but


with constraints
 Query performance
 Composed of one fact table with

 Scalability a composite primary key


 Dimension tables have a simple
Data Mart Issues: primary key which corresponds
exactly to one foreign key in the
 Long duration projects fact table

 Complexity of Star Schemas:


integration
 The most common dimensional
 Underestimation of
model
resources for data loading
 A fact table surrounded by
 Functionality dimension tables

 Size  Fact tables

• Contains FK for each


 Load performance
dimension table
 Users access to data in • Large relative to
multiple data marts dimension tables
• Read-only
 Internet/intranet access
 Dimension tables
 Administration
 Reference data

Applications
problem can be solved by data mining
 Medicine: disease, treatments and warehousing model
 Molecular or pharmaceutical:
We will explain the
new drugs
various details regarding this schema
 Security: face recognition
with examples during the paper
identification
presentation .we are sure that this
 Judiciary: data on judgment of
schema will helps our nation to
similar cases
overcome from lot of disadvantages
 Biometrics
 Multimedia retrieval
 Scientific data analysis
 Web site or Web store design,
and promotion Conclusion:

Comprehensive data
warehouses that integrate operational
data with customer, supplier, and market
information have resulted in an
explosion of information. Competition
New information source by
requires timely and sophisticated
DATAMINING &
WAREHOUSING in order to analysis on an integrated view of the
serve the people is as follows: data. However, there is a growing gap

All Indian’s were not between more powerful storage and

at all favor of all languages and tourist retrieval systems and the users’ ability to

places so for that purpose people always effectively analyze and act on the

depending up on others on that places information they contain. Both relational

this pays way for waste of time and in and OLAP technologies have

some conditions (i.e.) viewers from tremendous capabilities for navigating

other countries was facing a lot of massive data warehouses, but brute force

problems. Due to this reason they got a navigation of data is not enough. A new

bad opinion up on Indian’s. This technological leap is needed to structure


and prioritize information for specific
end-user problems. The data mining
tools can make this leap. Quantifiable
business benefits have been proven
through the integration of data mining
with current information systems, and
new products are on the horizon that will
bring this integration to an even wider
audience of users.

You might also like