Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 24

Data Mining

By Archana Ketkar

What Is Data Mining?


Data mining is the principle of sorting through large amounts of
data and picking out relevant information.
In other words
Data mining (knowledge discovery from data)

Extraction of interesting (non-trivial, implicit, previously unknown


and potentially useful) patterns or knowledge from huge amount
of data

Other names

Knowledge discovery (mining) in databases (KDD), knowledge


extraction, data/pattern analysis, data archeology, data
dredging, information harvesting, business intelligence, etc.

Some Definitions
Data : Data are any facts, numbers, or text that can

be processed by a computer.

operational or transactional data such as, sales, cost,


inventory, payroll, and accounting
nonoperational data, such as industry sales, forecast
data, and macro economic data
meta data - data about the data itself, such as logical
database design or data dictionary definitions

Information: The patterns, associations, or

relationships among all this data can provide


information.

Definitions Continued..
Knowledge: Information can be converted into

knowledge about historical patterns and future


trends. For example, summary information on retail
supermarket sales can be analyzed in terms of
promotional efforts to provide knowledge of
consumer buying behavior. Thus, a manufacturer or
retailer could determine which items are most
susceptible to promotional efforts.

Data Warehouses: Data warehousing is defined as a

process of centralized data management and


retrieval.

Data Warehouse example

Data Rich, Information Poor

Data Mining process

Knowledge discovery from data


KDD process includes

data cleaning (to remove noise and inconsistent data)

data integration (where multiple data sources may be


combined)

data selection (where data relevant to the analysis task are


retrieved from the database)

data transformation (where data are transformed or

consolidated into forms appropriate for mining by performing


summary or aggregation operations)

KDD continued.
data mining (an essential process where intelligent

methods are applied in order to extract data patterns.

pattern evaluation (to identify the truly interesting

patterns representing knowledge based on some


interestingness measures)

knowledge presentation (where visualization and

knowledge representation techniques are used to


present the mined knowledge to the user)

Data mining is a core of knowledge discovery process

Knowledge Discovery (KDD) Process

Data miningcore of
knowledge discovery
process

Pattern Evaluation
Data Mining

Task-relevant Data
Data Warehouse
Data Cleaning
Data Integration
Databases

Selection

Data Mining: Confluence of Multiple


Disciplines
Database
Technology

Machine
Learning
Pattern
Recognition

Statistics

Data Mining

Algorithm

Visualization

Other
Disciplines

Functionalities/Techniques:
Concept/Class Description: Characterization

and Discrimination
Mining Frequent Patterns, Associations and
correlations
Classification and Prediction
Cluster Analysis
Outlier Analysis
Evolution Analysis

Concept/Class Description:
Characterization and Discrimination
Data Characterization: A data mining system

should be able to produce a description


summarizing the characteristics of customers.
Example: The characteristics of customers
who spend more than $1000 a year at (some
store called ) AllElectronics. The result can be
a general profile such as age, employment
status or credit ratings.

Characterization and Discrimination


continued
Data Discrimination: It is a comparison of the

general features of targeting class data


objects with the general features of objects
from one or a set of contrasting classes. User
can specify target and contrasting classes.
Example: The user may like to compare the
general features of software products whose
sales increased by 10% in the last year with
those whose sales decreased by about 30%
in the same duration.

Mining Frequent Patterns,


Associations and correlations
Frequent Patterns : as the name suggests patterns that occur
frequently in data.
Association Analysis: from marketing perspective, determining
which items are frequently purchased together within the same
transaction.
Example: An example is mined from the (some store) AllElectronic
transactional database.
buys (X, Computers) buys (X, software) [Support = 1%,
confidence = 50% ]
X represents customer
confidence = 50% , if a customer buys a computer there is a
50% chance that he/she will buy software as well.
Support = 1%, means that 1% of all the transactions under
analysis showed that computer and software were purchased
together.

Mining Frequent Patterns,


Associations and correlations
Another example:
Age (X, 2029) ^ income (X, 20K-29K)

buys(X, CD Player) [Support = 2%,


confidence = 60% ]
Customers between 20 to 29 years of age
with an income $20000-$29000. There is
60% chance they will purchase CD Player
and 2% of all the transactions under analysis
showed that this age group customers with
that range of income bought CD Player.

Classification and Prediction


Classification is the process of finding a

model that describes and distinguishes data


classes or concepts for the purpose of being
able to use the model to predict the class of
objects whose class label is unknown.
Classification model can be represented in
various forms such as
IF-THEN Rules
A decision tree
Neural network

Classification Model

Cluster Analysis
Clustering analyses data objects without

consulting a known class label.


Example: Cluster analysis can be performed
on AllElectronics customer data in order to
identify homogeneous subpopulations of
customers. These clusters may represent
individual target groups for marketing. The
figure on next slide shows a 2-D plot of
customers with respect to customer locations
in a city.

Cluster Analysis

Outlier Analysis
Outlier Analysis : A database may contain data

objects that do not comply with the general behavior


or model of the data. These data objects are outliers.
Example: Use in finding Fraudulent usage of credit
cards. Outlier Analysis may uncover Fraudulent
usage of credit cards by detecting purchases of
extremely large amounts for a given account number
in comparison to regular charges incurred by the
same account. Outlier values may also be detected
with respect to the location and type of purchase or
the purchase frequency.

Evolution Analysis
Evolution Analysis: Data evolution analysis describes

and models regularities or trends for objects whose


behavior changes over time.
Example: Time-series data. If the stock market data
(time-series) of the last several years available from
the New York Stock exchange and one would like to
invest in shares of high tech industrial companies. A
data mining study of stock exchange data may
identify stock evolution regularities for overall stocks
and for the stocks of particular companies. Such
regularities may help predict future trends in stock
market prices, contributing to ones decision making
regarding stock investments.

References :
http://www.anderson.ucla.edu/faculty/jason.f

rand/teacher/technologies/palace/datamining.
htm
Data Mining Concepts and Techniques,Jiwei
Han and Micheline Kamber,2006.
http://www.eco.utexas.edu/~norman/BUS.FOR
/course.mat/Alex/#1
http://en.wikipedia.org/wiki/Data_mining
http://www-faculty.cs.uiuc.edu/~hanj/bk2/

Thank you!

You might also like