Professional Documents
Culture Documents
Chap 1
Chap 1
Chap 1
Introduction
1960s:
Data collection, database creation, IMS and network
DBMS
1970s:
Relational data model, relational DBMS implementation
1980s:
RDBMS, advanced data models (extended-relational, OO,
deductive, etc.) and application-oriented DBMS (spatial,
scientific, engineering, etc.)
1990s—2000s:
Data mining and data warehousing, multimedia
databases, and Web databases
What Is Data Mining?
Analysis
Hypothesis
Theorem
Data Collection
Prediction
Why Data Mining? — Potential
Applications
Database analysis and decision support
Market analysis and management
target marketing, customer relation management,
market basket analysis, cross selling, market
segmentation
Risk analysis and management
Forecasting, customer retention, improved
underwriting, quality control, competitive analysis
Fraud detection and management
Other Applications
Text mining (news group, email, documents) and Web
analysis.
Intelligent query answering
Data Mining: A KDD Process
Pattern Evaluation
Data mining: the core
of knowledge
discovery process. Data Mining
Task-relevant Data
Data Transformation
Data Selection
Warehouse
Data Cleaning
Data Integration
Databases
Steps of a KDD Process
(1)
Learning the application domain:
relevant prior knowledge and goals of application
Data transformation – Transformed or consolidated
-Summary or Aggregation.
Data Exploration
Statistical Analysis, Querying and Reporting
Pattern evaluation
Data
Databases Warehouse
Architecture Major Components (1)
Database or other information repository
Knowledge Base
- Domain knowledge
. Interestingness
. Constraints
. Threshold
. Meta data
- Concept Hierarchy
- Decision making
- Process control
- Information management
- Query Processing
Data Mining: On What Kind
of Data?
Relational databases
Data warehouses
Transactional databases
Advanced DB and information repositories
Object-oriented and object-relational
databases
Spatial databases
Time-series data and temporal data
Text databases and multimedia databases
Heterogeneous and legacy databases
Relational Database
Is a collection of tables, each of which is assigned a unique
name.
Each table consists of
Attributes (Cols or fields)
Tuples (Records or rows)
DM in RDB – is searching for trends or data patterns.
RDB – most popularly available, rich information
repositories.
It consists of the major data for DM.
Transactional Database
Consists of a file.
Each record represents a transaction.
It includes a unique transaction identity no.
List of item included in transactions.
Object-Oriented Database
Based on OOPS.
Each object associated with : set of variable, methods.
Instance.
Inheritance.
Object-Relational Database – Extends the basic relational data
model
by adding the power to handle:
- Complex data types.
- Class hierarchies.
- Object inheritance.
Spatial Database – Contains spatial-related information.
- Includes: Geographic (map) DB.
VLSI chip design DB.
Medical & Satellite image DB.
- Represented in vector format.
- Maps are: roads, bridges, buildings & lakes.
- Mining can be used in:
Describing the characteristics of houses located
near a
specified kind of location, such as a park.
Temporal Database or Time series Database – Stores time
related data
- Usually stores relational data that include time-related
attributes
- It involves several time stamps.
- Time may be decomposed to fiscal, academic, calendar
year.
Text Database – Contains word description for objects.
- Error or bug reports, warning message, summary reports,
notes
- Unstructured: Web pages.
- Semi structured: e-mail messages.
- Well structured: Library DB.
Neural N/W
High Per.Comp.
Machine
Learning Data Mining Data Visualization
Information
Info. Retrieval
Science
Pattern Recogn.
Signal Processing Spatial Data Anal.
Data Mining: Classification
Schemes
General functionality
Descriptive data mining
Predictive data mining
Different views, different
classifications
Kinds of databases to be mined
Kinds of knowledge to be discovered
Kinds of techniques utilized
A Multi-Dimensional View of
Data Mining Classification
Databases to be mined
Relational, transactional, object-oriented, object-
levels
Techniques utilized
Database-oriented, data warehouse (OLAP), machine
Layer2
MDDB
MDDB
Meta
Data
Filtering& Integration Database API Filtering
Layer1
Data cleaning Data
Databases Data
Data integration Warehouse Repository
Major Issues in Data Mining (1)
Mining methodology and user interaction
Mining different kinds of knowledge in databases
- Since different users can be interested in different kinds of knowledge,
data mining should cover a wide spectrum of data analysis and KD tasks.
Interactive mining of knowledge at multiple levels of abstraction
- Since it is difficult to know exactly what can be discovered within a DB.
Incorporation of background knowledge
-Such as integrity constraints and deduction rules, can help focus and
speed up a data mining process.
Data mining query languages and ad-hoc data mining
-Such a language should be integrated with a DB or DW/H query
language, and optimized for efficient and flexible data mining.
Presentation and visualization of data mining results
-Discovered knowledge should be expressed in high level languages,
visual representation (i.e) easily understood and directly usable by
humans. Such as trees, tables, rules, graphs, charts, crosstabs, matrices,
curves.
Handling noise and incomplete data
-As a result the accuracy of the discovered patterns can be poor.