Archana Data Mining

Data Mining
By Archana Ketkar
What Is Data Mining?

Data mining is the principle of sorting through large amounts of
data and picking out relevant information.
In other words
Data mining (knowledge discovery from data)
Extraction of interesting (non-trivial, implicit, previously unknown

and potentially useful) patterns or knowledge from huge amount
of data
Other names
Knowledge discovery (mining) in databases (KDD), knowledge

extraction, data/pattern analysis, data archeology, data
dredging, information harvesting, business intelligence, etc.
Some Definitions
Data : Data are any facts, numbers, or text that can
be processed by a computer.
operational or transactional data such as, sales, cost,

inventory, payroll, and accounting
nonoperational data, such as industry sales, forecast
data, and macro economic data
meta data - data about the data itself, such as logical
database design or data dictionary definitions
Information: The patterns, associations, or
relationships among all this data can provide

information.
Definitions Continued..
Knowledge: Information can be converted into
knowledge about historical patterns and future

trends. For example, summary information on retail
supermarket sales can be analyzed in terms of
promotional efforts to provide knowledge of
consumer buying behavior. Thus, a manufacturer or
retailer could determine which items are most
susceptible to promotional efforts.
Data Warehouses: Data warehousing is defined as a
process of centralized data management and

retrieval.
Data Warehouse example
Data Rich, Information Poor
Data Mining process
Knowledge discovery from data

KDD process includes
data cleaning (to remove noise and inconsistent data)
data integration (where multiple data sources may be

combined)
data selection (where data relevant to the analysis task are

retrieved from the database)
data transformation (where data are transformed or
consolidated into forms appropriate for mining by performing

summary or aggregation operations)
KDD continued.
data mining (an essential process where intelligent
methods are applied in order to extract data patterns.
pattern evaluation (to identify the truly interesting
patterns representing knowledge based on some

interestingness measures)
knowledge presentation (where visualization and
knowledge representation techniques are used to

present the mined knowledge to the user)
Data mining is a core of knowledge discovery process
Knowledge Discovery (KDD) Process
Data miningcore of
knowledge discovery
process
Pattern Evaluation
Data Mining
Task-relevant Data
Data Warehouse
Data Cleaning
Data Integration
Databases
Selection
Data Mining: Confluence of Multiple

Disciplines
Database
Technology
Machine
Learning
Pattern
Recognition
Statistics
Data Mining
Algorithm
Visualization
Other
Disciplines
Functionalities/Techniques:
Concept/Class Description: Characterization
and Discrimination
Mining Frequent Patterns, Associations and
correlations
Classification and Prediction
Cluster Analysis
Outlier Analysis
Evolution Analysis
Concept/Class Description:
Characterization and Discrimination
Data Characterization: A data mining system
should be able to produce a description

summarizing the characteristics of customers.
Example: The characteristics of customers
who spend more than $1000 a year at (some
store called ) AllElectronics. The result can be
a general profile such as age, employment
status or credit ratings.
Characterization and Discrimination

continued
Data Discrimination: It is a comparison of the
general features of targeting class data

objects with the general features of objects
from one or a set of contrasting classes. User
can specify target and contrasting classes.
Example: The user may like to compare the
general features of software products whose
sales increased by 10% in the last year with
those whose sales decreased by about 30%
in the same duration.
Mining Frequent Patterns,

Associations and correlations
Frequent Patterns : as the name suggests patterns that occur
frequently in data.
Association Analysis: from marketing perspective, determining
which items are frequently purchased together within the same
transaction.
Example: An example is mined from the (some store) AllElectronic
transactional database.
buys (X, Computers) buys (X, software) [Support = 1%,
confidence = 50% ]
X represents customer
confidence = 50% , if a customer buys a computer there is a
50% chance that he/she will buy software as well.
Support = 1%, means that 1% of all the transactions under
analysis showed that computer and software were purchased
together.
Mining Frequent Patterns,

Associations and correlations
Another example:
Age (X, 2029) ^ income (X, 20K-29K)
buys(X, CD Player) [Support = 2%,

confidence = 60% ]
Customers between 20 to 29 years of age
with an income $20000-$29000. There is
60% chance they will purchase CD Player
and 2% of all the transactions under analysis
showed that this age group customers with
that range of income bought CD Player.
Classification and Prediction

Classification is the process of finding a
model that describes and distinguishes data

classes or concepts for the purpose of being
able to use the model to predict the class of
objects whose class label is unknown.
Classification model can be represented in
various forms such as
IF-THEN Rules
A decision tree
Neural network
Classification Model
Cluster Analysis
Clustering analyses data objects without
consulting a known class label.

Example: Cluster analysis can be performed
on AllElectronics customer data in order to
identify homogeneous subpopulations of
customers. These clusters may represent
individual target groups for marketing. The
figure on next slide shows a 2-D plot of
customers with respect to customer locations
in a city.
Cluster Analysis
Outlier Analysis
Outlier Analysis : A database may contain data
objects that do not comply with the general behavior

or model of the data. These data objects are outliers.
Example: Use in finding Fraudulent usage of credit
cards. Outlier Analysis may uncover Fraudulent
usage of credit cards by detecting purchases of
extremely large amounts for a given account number
in comparison to regular charges incurred by the
same account. Outlier values may also be detected
with respect to the location and type of purchase or
the purchase frequency.
Evolution Analysis
Evolution Analysis: Data evolution analysis describes
and models regularities or trends for objects whose

behavior changes over time.
Example: Time-series data. If the stock market data
(time-series) of the last several years available from
the New York Stock exchange and one would like to
invest in shares of high tech industrial companies. A
data mining study of stock exchange data may
identify stock evolution regularities for overall stocks
and for the stocks of particular companies. Such
regularities may help predict future trends in stock
market prices, contributing to ones decision making
regarding stock investments.
References :
http://www.anderson.ucla.edu/faculty/jason.f
rand/teacher/technologies/palace/datamining.
htm
Data Mining Concepts and Techniques,Jiwei
Han and Micheline Kamber,2006.
http://www.eco.utexas.edu/~norman/BUS.FOR
/course.mat/Alex/#1
http://en.wikipedia.org/wiki/Data_mining
http://www-faculty.cs.uiuc.edu/~hanj/bk2/
Thank you!

Archana Data Mining

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Archana Data Mining

Uploaded by

Copyright:

Available Formats

Data Mining

What Is Data Mining?

Extraction of interesting (non-trivial, implicit, previously unknown

Knowledge discovery (mining) in databases (KDD), knowledge

operational or transactional data such as, sales, cost,

Information: The patterns, associations, or

relationships among all this data can provide

knowledge about historical patterns and future

Data Warehouses: Data warehousing is defined as a

process of centralized data management and

Data Warehouse example

Data Rich, Information Poor

Data Mining process

Knowledge discovery from data

data cleaning (to remove noise and inconsistent data)

data integration (where multiple data sources may be

data selection (where data relevant to the analysis task are

data transformation (where data are transformed or

consolidated into forms appropriate for mining by performing

methods are applied in order to extract data patterns.

pattern evaluation (to identify the truly interesting

patterns representing knowledge based on some

knowledge presentation (where visualization and

knowledge representation techniques are used to

Data mining is a core of knowledge discovery process

Knowledge Discovery (KDD) Process

Data Mining: Confluence of Multiple

should be able to produce a description

Characterization and Discrimination

general features of targeting class data

Mining Frequent Patterns,

Mining Frequent Patterns,

buys(X, CD Player) [Support = 2%,

Classification and Prediction

model that describes and distinguishes data

consulting a known class label.

objects that do not comply with the general behavior

and models regularities or trends for objects whose

You might also like