Professional Documents
Culture Documents
Micro Oledb
Micro Oledb
Introduction
Overview and design philosophy Basic components
Industry standard is critical for data mining development, usage, interoperability, and exchange OLEDB for DM is a natural evolution from OLEDB and OLDB for OLAP Building mining applications over relational databases is nontrivial
Goal: ease the burden of developing mining applications in large relational databases
Data Mining: Concepts and Techniques 3
Generating data mining models Store, maintain and refresh models as data is updated Programmatically use the model on other data set Browse models
Overview
Define a mining model Attributes to be predicted Attributes to be used for prediction Algorithm used to build the model Populate a mining model from training data Predict attributes for new data Browse a mining model fro reporting and visualization
Data Mining: Concepts and Techniques 7
Create a data mining module object CREATE MINING MODEL [model_name] Insert training data into the model and train it INSERT INTO [model_name] Use the data mining model SELECT relation_name.[id], [model_name].[predict_attr] consult DMM content in order to make predictions and browse statistics obtained by the model Using DELETE to empty/reset Predictions on datasets: prediction join between a model and a data set (tables) Deploy DMM by just writing SQL queries!
Data Mining: Concepts and Techniques 8
Product Purchases
Customer ID Product Name Quantity
Age Prob
Product Type
CID Gend
Male Male
Hair
Black Black
Age
35 35
Age prob
100% 100%
Prod
TV VCR
Quan
1 1
Type
Elec Elec
Car
Car Car
Car prob
100% 100%
Car Owernership
Customer ID
Car Car Prob
1 1
1
1 1 1
Male
Male Male Male
Black
Black Black Black
35
35 35 35
100%
100% 100% 100%
Ham
TV VCR Ham
6
1 1 6
Food
Elec Elec Food
Car
Van Van Van
100%
50% 50% 50%
10
Quan
1 1 6
Type
Elec Elec Food
Car prob
100% 50%
11
12
13
Example
CREATE MINING MODEL [Age Prediction] %Name of Model ( [Customer ID] LONG KEY, %source column [Gender] TEXT DISCRETE, %source column [Age] Double DISCRETIZED() PREDICT, %prediction column [Product Purchases] TABLE %source column ( [Product Name] TEXT KEY, %source column [Quantity] DOUBLE NORMAL CONTINUOUS, %source column [Product Type] TEXT DISCRETE RELATED TO [Product Name] %source column )) USING [Decision_Trees_101] %Mining algorithm used
January 14, 2014 Data Mining: Concepts and Techniques 14
Column Specifiers
KEY ATTRIBUTE RELATION (RELATED TO clause) QUALIFIER (OF clause) PROBABILITY: [0, 1] VARIANCE SUPPORT PROBABILITY-VARIANCE ORDER TABLE
Data Mining: Concepts and Techniques 15
Attribute Types
DISCRETE
ORDERED CYCLICAL CONTINOUS DISCRETIZED SEQUENCE_TIME
16
Populating A DMM
Use INSERT INTO statement
Consuming a case using the data mining model Use SHAPE statement to create the nested table from the input data
17
Prediction join Prediction on dataset D using DMM M Different to equi-join DMM: a truth table SELECT statement associated with PREDICTION JOIN specifies values extracted from DMM
19
Browsing DMM
What is in a DMM?
Browsing DMM
Visualization
21
Concluding Remarks
OLE DB for DM integrates data mining and database systems A good standard for mining application builders How can we be involved? Provide association/sequential pattern mining modules for OLE DB for DM? Design more concrete language primitives? References http://www.microsoft.com/data.oledb/d m.html
Data Mining: Concepts and Techniques 22