Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 22

Data Mining: Concepts and Techniques

January 14, 2014

Data Mining: Concepts and Techniques

Appendix A: An Introduction to Microsofts OLE OLDB for Data Mining


Introduction
Overview and design philosophy Basic components

Data set components Data mining models

Operations on data model Concluding remarks


Data Mining: Concepts and Techniques 2

January 14, 2014

Why OLE DB for Data Mining?

Industry standard is critical for data mining development, usage, interoperability, and exchange OLEDB for DM is a natural evolution from OLEDB and OLDB for OLAP Building mining applications over relational databases is nontrivial

Need different customized data mining algorithms and methods

Significant work on the part of application builders

Goal: ease the burden of developing mining applications in large relational databases
Data Mining: Concepts and Techniques 3

January 14, 2014

Motivation of OLE DB for DM

Facilitate deployment of data mining models


Generating data mining models Store, maintain and refresh models as data is updated Programmatically use the model on other data set Browse models

Enable enterprise application developers to participate in building data mining solutions

January 14, 2014

Data Mining: Concepts and Techniques

Features of OLE DB for DM

Independent of provider or software


Not specialized to any specific mining model Structured to cater to all well-known mining models Part of upcoming release of Microsoft SQL Server 2000

January 14, 2014

Data Mining: Concepts and Techniques

Overview

Core relational engine


exposes OLE DB in a language-based API Analysis server exposes OLE DB OLAP and OLE DB DM

Data mining applications OLE DB OLAP/DM


Analysis Server OLE DB RDB engine
6

Maintain SQL metaphor


Reuse existing notions
Data Mining: Concepts and Techniques

January 14, 2014

Key Operations to Support Data Mining Models

Define a mining model Attributes to be predicted Attributes to be used for prediction Algorithm used to build the model Populate a mining model from training data Predict attributes for new data Browse a mining model fro reporting and visualization
Data Mining: Concepts and Techniques 7

January 14, 2014

DMM As Analogous to A Table in SQL

Create a data mining module object CREATE MINING MODEL [model_name] Insert training data into the model and train it INSERT INTO [model_name] Use the data mining model SELECT relation_name.[id], [model_name].[predict_attr] consult DMM content in order to make predictions and browse statistics obtained by the model Using DELETE to empty/reset Predictions on datasets: prediction join between a model and a data set (tables) Deploy DMM by just writing SQL queries!
Data Mining: Concepts and Techniques 8

January 14, 2014

Two Basic Components

Cases/caseset: input data

A table or nested tables (for hierarchical data)

Data mining model (DMM): a special type of table

A caseset is associated with a DMM and meta-info while creating a DMM


Save mining algorithm and resulting abstraction instead of data itself Fundamental operations: CREATE, INSERT INTO, PREDICTION JOIN, SELECT, DELETE FROM, and DROP
Data Mining: Concepts and Techniques 9

January 14, 2014

Flatterned Representation of Caseset


Customers
Customer ID Gender Hair Color Age

Product Purchases
Customer ID Product Name Quantity

Problem: Lots of replication!

Age Prob

Product Type
CID Gend
Male Male

Hair
Black Black

Age
35 35

Age prob
100% 100%

Prod
TV VCR

Quan
1 1

Type
Elec Elec

Car
Car Car

Car prob
100% 100%

Car Owernership
Customer ID
Car Car Prob

1 1

1
1 1 1

Male
Male Male Male

Black
Black Black Black

35
35 35 35

100%
100% 100% 100%

Ham
TV VCR Ham

6
1 1 6

Food
Elec Elec Food

Car
Van Van Van

100%
50% 50% 50%

January 14, 2014

Data Mining: Concepts and Techniques

10

Logical Nested Table Representation of Caseset

Use Data Shaping Service to generate a hierarchical rowset

Part of Microsoft Data Access Components (MDAC) products


CID Gend Hair Age Age prob Product Purchases Prod
TV 1 Male Black 35 100% VCR Ham

Car Ownership Car


Car Van

Quan
1 1 6

Type
Elec Elec Food

Car prob
100% 50%

January 14, 2014

Data Mining: Concepts and Techniques

11

More About Nested Table

Not necessary for the storage subsystem to support nested records


Cases are only instantiated as nested rowsets prior to training/predicting data mining models Same physical data may be used to generate different casesets

January 14, 2014

Data Mining: Concepts and Techniques

12

Defining A Data Mining Model

The name of the model


The algorithm and parameters The columns of caseset and the relationships among columns Source columns and prediction columns

January 14, 2014

Data Mining: Concepts and Techniques

13

Example
CREATE MINING MODEL [Age Prediction] %Name of Model ( [Customer ID] LONG KEY, %source column [Gender] TEXT DISCRETE, %source column [Age] Double DISCRETIZED() PREDICT, %prediction column [Product Purchases] TABLE %source column ( [Product Name] TEXT KEY, %source column [Quantity] DOUBLE NORMAL CONTINUOUS, %source column [Product Type] TEXT DISCRETE RELATED TO [Product Name] %source column )) USING [Decision_Trees_101] %Mining algorithm used
January 14, 2014 Data Mining: Concepts and Techniques 14

Column Specifiers

KEY ATTRIBUTE RELATION (RELATED TO clause) QUALIFIER (OF clause) PROBABILITY: [0, 1] VARIANCE SUPPORT PROBABILITY-VARIANCE ORDER TABLE
Data Mining: Concepts and Techniques 15

January 14, 2014

Attribute Types

DISCRETE
ORDERED CYCLICAL CONTINOUS DISCRETIZED SEQUENCE_TIME

January 14, 2014

Data Mining: Concepts and Techniques

16

Populating A DMM
Use INSERT INTO statement
Consuming a case using the data mining model Use SHAPE statement to create the nested table from the input data

January 14, 2014

Data Mining: Concepts and Techniques

17

Example: Populating a DMM


INSERT INTO [Age Prediction] ( [Customer ID], [Gender], [Age], [Product Purchases](SKIP, [Product Name], [Quantity], [Product Type]) ) SHAPE {SELECT [Customer ID], [Gender], [Age] FROM Customers ORDER BY [Customer ID]} APPEND {SELECT [CustID], {product Name], [Quantity], [Product Type] FROM Sales ORDER BY [CustID]} RELATE [Customer ID] TO [CustID] ) AS [Product Purchases]
January 14, 2014 Data Mining: Concepts and Techniques 18

Using Data Model to Predict

Prediction join Prediction on dataset D using DMM M Different to equi-join DMM: a truth table SELECT statement associated with PREDICTION JOIN specifies values extracted from DMM

January 14, 2014

Data Mining: Concepts and Techniques

19

Example: Using a DMM in Prediction


SELECT t.[Customer ID], [Age Prediction].[Age] FROM [Age Prediction] PRECTION JOIN (SHAPE {SELECT [Customer ID], [Gender] FROM Customers ORDER BY [Customer ID]} APPEND ( {SELECT [CustID], [Product Name], [Quantity] FROM Sales ORDER BY [CustID]} RELATE [Customer ID] TO [CustID] ) AS [Product Purchases] ) AS t ON [Age Prediction].[Gender]=t.[Gender] AND [Age Prediction].[Product Purchases].[Product Name]=t.[Product Purchases].[Product Name] AND [Age Prediction].[Product Purchases].[Quantity]=t.[Product Purchases].[Quantity]
January 14, 2014 Data Mining: Concepts and Techniques 20

Browsing DMM
What is in a DMM?

Rules, formulas, trees, , etc

Browsing DMM

Visualization

January 14, 2014

Data Mining: Concepts and Techniques

21

Concluding Remarks

OLE DB for DM integrates data mining and database systems A good standard for mining application builders How can we be involved? Provide association/sequential pattern mining modules for OLE DB for DM? Design more concrete language primitives? References http://www.microsoft.com/data.oledb/d m.html
Data Mining: Concepts and Techniques 22

January 14, 2014

You might also like