Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

DATA MINING EXTENSIONS

Introduction
Data mining is a broad term often used to describe examining a large amount of data to extract valuable information. The data mining process involves extending existing information to gain new insights. It includes analysis techniques such as regression, segmentation, clustering, and forecasting. All of these approaches have a common purposeto project new outcomes based on past experience by using predictive models. MicroStrategy data mining extensions help facilitate the development and deployment of these predictive models. Extending the MicroStrategy platforms powerful analytical, query, and reporting capabilities to products used to create predictive models promotes developing the potential of your data.

2003 MicroStrategy, Inc.

301

16

Data Mining Extensions

Advanced Reporting Guide

For example, data mining extensions can help with campaign management. Your company wants to improve the effectiveness of its marketing campaigns, with the goals of reducing costs and increasing the percent of positive response. You gather data about the customers targeted for past campaigns. This includes information such as age, gender, income, education, household size, and whether they responded positively or negatively to the campaign. Next, you develop a MicroStrategy report to generate the data set, which is then analyzed to determine if positive responders shared any factors. Once the predictive factors are identified, you create a MicroStrategy metric that embodies this predictive model. This predictive metric is used to establish the audience that should be targeted in similar future campaigns. The metric can forecast who is likely to respond positively, thereby lowering direct marketing costs and increasing effectiveness. Data mining extensions example at the end of this * The chapter elaborates on this campaign management example.

The data mining extensions process


The process of creating a predictive model and incorporating it into MicroStrategy reports involves the following steps: 1 Create a data mart report to generate the data set that will be used to develop the predictive model. 2 Create a predictive model from the data set, using a third-party application. 3 Import the predictive model into your MicroStrategy project. 4 Create a metric, referred to as the predictive metric, based on the predictive model. 5 Use the predictive metric in MicroStrategy reports, to project new outcomes based on the past experience embodied in the data set.

302 The data mining extensions process

2003 MicroStrategy, Inc.

Advanced Reporting Guide

Data Mining Extensions

16

Each step is described in more detail below. While the step-by-step procedures for all processes that take place in MicroStrategy are included in the online help, a high-level process for each is included below.

Creating a data mart


The first step in creating a predictive model is developing a data mart report and using it to generate a data mart. This data is then used in a third-party data mining tool to develop a predictive model. A data mart is a database, usually smaller than a data warehouse, that stores report results in the form of relational tables. It usually focuses on a specific subject or department. The data mart report is a special kind of report that saves its report data in a data mart database, rather than returning those results to the user. more information on data marts, see Chapter 12, * For Data Marting. A data mart usually has the following features: Each column of the data mart represents a particular type of data that either has predictive value or represents an outcome worth predicting. Each row represents a specific attribute, such as a customer, transaction, or product.

The data mart can be easily created, refreshed, and accessed, even if it is large and complex. The database tables created by the data mart report are easily accessed by third-party data mining tools, since most of them access data using ODBC. This setup also promotes consistency between the data set used to develop the predictive model and the inputs to the predictive metric created from the imported predictive model.

2003 MicroStrategy, Inc.

The data mining extensions process

303

16

Data Mining Extensions

Advanced Reporting Guide

The report sample below is an example of a data mart report for customer information. Note that not all columns or rows have been included, due to space constraints.

Notice that each attribute, such as Age Range, has two attribute forms on the reportthe description and the ID. Some data mining software works better using numbers, such as the ID, while the description is included for ease of use.

Guidelines for the data mart report


Before creating the data mart, ensure that the attributes and metrics used in the data mart report can also be used as inputs to the predictive metric. (Recall that the predictive metric is the metric created from the predictive model after it is imported into MicroStrategy.) If the inputs are not properly defined, the model cannot properly predict scenarios. The following guidelines will help you create your data mart report: Use a flat report template, placing attributes on rows only and metrics on columns only. The data mart is a database table consisting of a single header row followed by rows of data. Placing attributes in the columns usually creates multiple header rows on each column, which cannot be easily represented in a database table.

304 The data mining extensions process

2003 MicroStrategy, Inc.

Advanced Reporting Guide

Data Mining Extensions

16

Only metrics can be used as predictors. The data mart report can contain attributes, which can be useful when you review the data mart. However, they cannot be used as predictors because inputs to the predictive metric can only be metrics. Neither attributes nor attribute forms can be used as inputs.

To use an attribute as a predictor, create a metric using the attribute. A shadow metric represents the attribute form to be included in the model. It allows you to use an attribute as a predictor. For more information, see the Shadow metrics section below.

The metric level must match the attributes used on the rows of the data mart report. Attributes on the rows of the data mart add dimensionality to the data. If a metric is used in the predictive model without any dimensionality (also known as the level), its results change based on the attributes of the report using the predictive metric. Creating another type of shadow metric, one that sets the metric level, resolves this problem. For more information, see the Dimensional shadow metrics section below.

If you need to group a metrics results by an attribute, use a filter. The filter allows you to use an attribute to qualify metrics. For example, you could display customer revenue by payment method. For more information, see the Filtered shadow metrics section below.

Shadow metrics
A shadow metric allows you to use an attribute as a predictor in your predictive model. It represents the attribute form you want to include in the model. Attributes cannot be used as predictors in the data mart because the predictive metric accepts only metrics as inputs, not attributes or attribute forms.

2003 MicroStrategy, Inc.

The data mining extensions process

305

16

Data Mining Extensions

Advanced Reporting Guide

Data mining analyzes demographic and psychographic information about customers, looking for attributes that are strong predictors. For example, your MicroStrategy project contains a Customer attribute with attribute forms for age, gender, and income. To use these attribute forms in a predictive model, you would create shadow metrics for them. A shadow metric for age would look like the following: Max(Customer@Age) {ReportLevel}
High-level process - Creating a shadow metric
1 Create a new metric using the required attribute. 2 Change the attribute form, as necessary. 3 Enable outer joins to include all data. 4 Create a metric column alias to automatically create the predictive metric when the predictive model is imported. 5 Save the metric, using the alias as the metric name.

do not need to set the level of these shadow * You metrics, because by definition they are calculated at the level of the attribute.

Dimensional shadow metrics


Placing attributes on the rows of the data mart report adds dimensionality to the data by restricting the data to a particular dimension, or level, of the data model. For example, if the Customer attribute is placed on the rows and the Revenue metric on the columns, the data in the Revenue column is at the customer level. If the Revenue metric is used in the predictive model without any dimensionality, the data it produces changes based on the attributes of the report using the predictive metric. If Year is placed on the rows of that report, the predictive metric calculates yearly revenue rather than revenue for customers.

306 The data mining extensions process

2003 MicroStrategy, Inc.

Advanced Reporting Guide

Data Mining Extensions

16

However, if you create a shadow metric set at the Customer level for the predictive model, the predictive metric on the final report calculates at the Customer level. The following table outlines the high-level process of creating such a dimensional shadow metric.
High-level process - Creating a dimensional shadow metric
1 Open the metric that requires dimensionality. 2 Add the necessary attributes as metric levels. 3 Enable outer joins to include all data. 4 Create a metric column alias to automatically create the predictive metric when the predictive model is imported. 5 Save the metric, using the alias as the metric name.

Filtered shadow metrics


To group a metrics results by an attribute, create a filtered metric for each category. For example, you need to display customer revenue by payment method. You do this by placing the Customer attribute on the rows of the report, the Revenue metric on the columns, and the Payment Method attribute on the columns. The resulting report is displayed below.

2003 MicroStrategy, Inc.

The data mining extensions process

307

16

Data Mining Extensions

Advanced Reporting Guide

This report presents problems if it is used as a data mart report. Since multiple headers exist on each column, the data mart version moves Payment Method to the rows. The result is a separate row for each Payment Method, or five rows for each customer. Since the metric is called Revenue, it is difficult to differentiate between the different types of revenue. Creating a filtered metric for each Payment Type and using the metrics on the data mart report result in a data mart that is easier to use.
High-level process - Creating a filtered shadow metric
1 Create a filter for each of the necessary attribute elements. In the example above, they are Visa, Amex, Check, and so on. 2 For each attribute element, create a dimensional shadow metric, adding the filter for that attribute element.

The following report uses filtered shadow metrics, which are very useful for data mining.

308 The data mining extensions process

2003 MicroStrategy, Inc.

Advanced Reporting Guide

Data Mining Extensions

16

The process of data mart generation for data mining


After you have met the data mart report guidelines, creating a data mart from a MicroStrategy report is very simple, as outlined below. See the online help for more detailed instructions.
High-level process - Creating a data mart
1 Create a new report, with only attributes on the rows and only metrics on the columns. 2 Enable outer joins so that rows are not excluded if metric data is missing. 3 Run the report to display the results. 4 Configure the data mart. 5 Save the report. 6 Run the report to populate the data mart table in the database.

Creating a predictive model


Once your data mart has been created as a table in the database, third-party data mining tools can access it. These tools allow you to develop a predictive model in a workflow involving data exploration and discovery. Since they are not part of MicroStrategy, refer to their documentation for specific instructions on importing data and creating a model. Currently, MicroStrategy data mining is extended only to ANGOSS KnowledgeSTUDIO. The following procedure walks you through importing a MicroStrategy data mart into their product.
Import data into ANGOSS KnowledgeSTUDIO

1 Open KnowledgeSTUDIO. 2 Select File, then New Project to create a new project. The Project Wizard opens on the Specify the Compute Server Step 1 page and will guide you through the process.

2003 MicroStrategy, Inc.

The data mining extensions process

309

16

Data Mining Extensions

Advanced Reporting Guide

3 Do not change the default of My Computer as the selected Compute Server. you do not have to change anything on this page, * Since you can click Skip this step in the future. 4 Click Next. The Specify the Project Name and Description - Step 2 page opens. 5 Enter a Name, Location, and Description for your project. 6 Click Next. The Getting Started - Step 3 page, which is the final page, opens. 7 Select Insert a Data Set. 8 Click Finish. The Insert Data Set Wizard opens on the Dataset Source - Step 1 page. 9 Select Copy and transform the data from source into minable form. 10 Select ODBC Import Driver from the Source Driver Type drop-down list. 11 Click Next. The ODBC Connection Setup - Step 2 page opens. 12 Click Browse to select the database that contains the data in your MicroStrategy project. 13 If necessary, enter login information for the selected database. 14 Click Next. The Table Selection - Step 3 page opens. 15 In the Selected Table box, select the table that contains your data mart. 16 Click Next. The Dataset Editor - Step 5 page opens.

* The Step 4 page is skipped for this particular setup.


17 Make any adjustments to the data, such as whether to include the item or its data type. 18 Click Next. The Dataset Name - Step 6 page opens.
310 The data mining extensions process
2003 MicroStrategy, Inc.

Advanced Reporting Guide

Data Mining Extensions

16

19 Enter a name for the dataset. 20 After clicking Finish, the data set is imported. Now that the data set is available in the third-party data mining tool, you can begin the process of data exploration and discovery that exposes the predictive information the data contains. These data mining tools provide many ways of extracting information from the data, including decision trees, cluster analysis, and advanced statistics. Ultimately, the process should lead to a predictive model. As the actual data mining workflows and techniques can be sophisticated and vendor-specific, they are beyond the scope of this document. For more information about using a particular data mining tool, consult the tool's documentation or contact the third-party vendor. After the predictive model has been created, you will want to use it within the MicroStrategy Business Intelligence Platform. First, you must import it into your MicroStrategy project, which is the subject of the next section.

Importing the model


After you have created a predictive model, you must import it into your MicroStrategy project. The steps are listed below and then described in more detail. 1 Export the model from the third-party data mining tool in an XML format. 2 Import the model into your MicroStrategy project.

2003 MicroStrategy, Inc.

The data mining extensions process

311

16

Data Mining Extensions

Advanced Reporting Guide

Exporting the model


Since the data mining tools are not part of MicroStrategy, refer to their documentation for specific instructions on exporting data. Currently, MicroStrategy data mining is extended only to ANGOSS KnowledgeSTUDIO. The following procedure walks you through exporting a model from KnowledgeSTUDIO to MicroStrategy.
Export a model from ANGOSS KnowledgeSTUDIO

1 In KnowledgeSTUDIO, double-click the predictive model, in the left pane. 2 From the menu bar, select Insert, then Code Generation. The Code Generation dialog box opens. 3 Select XML as the Code Generation Code Type. 4 Click OK. KnowledgeSTUDIO generates the XML code for your model and adds it to the project as a Note. 5 From the menu bar, select File, then Save Text File As. The Save As dialog box opens. 6 Enter a name for the export file, using a .txt extension. 7 Select the directory in which to save the file. 8 Click OK. The XML is saved to the specified file and location.

312 The data mining extensions process

2003 MicroStrategy, Inc.

Advanced Reporting Guide

Data Mining Extensions

16

Importing the model


Once the model is available in an XML format, you can import it into your MicroStrategy project. The high-level process is included below. For detailed instructions, see the online help. you can access the Import Data Mining Model * Before option, you must have the Use Data Mining Import Dialog privilege. For information on viewing or changing your privileges, see the online help.
High-level process - Importing the model
1 Select Import Data Mining Model from the Schema menu of Desktop. 2 Choose the third-party tool used to create the model. 3 Select the model to import. 4 Name the model. 5 Click OK to import the model.

Creating a predictive metric


A predictive metric is a metric that displays the results of a predictive model. You can use the metric to analyze the results of the model against actual data in the warehouse, generating possible scenarios. A predictive metric is created and calculates like any other metric. It is built on the model you import and invokes a vendor-specific function, described below. That functions input parameters contain the metrics that had predictive value from the dataset. This vendor-specific function is created the first time a model from a particular vendor is imported. Once created, it is utilized by the predictive metric, which was generated during the import of the model. ANGOSS creates two vendor-specific functions: DmxAngossNum for models with all numeric inputs DmxAngossVar for models with any variant (that is, non-numeric) inputs

2003 MicroStrategy, Inc.

The data mining extensions process

313

16

Data Mining Extensions

Advanced Reporting Guide

If the input metrics could be identified from the names of the inputs to the model, the predictive metric was created automatically during the import process. For this to have occurred, you must have named the metric column when you created shadow metrics for the data mart report. If any of the required input metrics are not found, you are prompted to select the appropriate inputs.

Using the predictive metric in reports


With the predictive model implemented as a metric, it can be used in reports to determine possible trends and outcomes. Again, creating a report for data mining is similar to creating a regular report. These reports have no special requirements, other than including the predictive metric. For more detailed instructions on creating a report, see the online help.

Data mining extensions example


Recall the campaign management scenario described at the beginning of this chapter. Your company wants to improve the effectiveness of its marketing campaigns, with the goals of reducing costs and increasing the percent of positive response. The results of a previous campaign will be analyzed to determine what factors, if any, can be used to predict the performance of a similar future campaign. The previous campaign was run during the fall of 2000, when 110 customers out of 842 responded favorably. Accurate data on the targeted customers and their responses have been entered into a MicroStrategy project. The first step in data mining is the creation of a data mart report. The pertinent information for this report is listed below: name age education gender

314 Data mining extensions example

2003 MicroStrategy, Inc.

Advanced Reporting Guide

Data Mining Extensions

16

household count income range marital status

sample data mart report for this example can be *A found in the Creating a data mart section. You want to use all of these attributes, except for customer name, as predictors in the predictive model. Therefore, you must create a shadow metric for each, since the predictive metric accepts only metrics as inputs. Some example shadow metrics for this report are shown below: Max([Customer Age Range]@DESC) {Customer} Max([Customer Age Range]@ID) {Customer} Max([Customer Education]@DESC) {Customer} Filter the data mart report to include only the customers acquired before October 1, 2000, since newer customers would not have been included in the fall 2000 campaign. Once the data mart report is complete, use it to generate the data mart. Using a third-party data mining tool, import and then analyze the data set represented by the data mart. Even though a large number of attributes were included, the resulting model includes only those attributes that were established as strong predictorsgender, age, education, and household count. Construct a predictive model and import its XML representation into MicroStrategy. The result of the importation is a predictive metric called ResponsePredictor, which incorporates those attributes. The predictive metric, which uses the DmxAngossNum function, is shown below:

2003 MicroStrategy, Inc.

Data mining extensions example

315

16

Data Mining Extensions

Advanced Reporting Guide

The analytical function DmxAngossNum actually calls a third-party neural network algorithm, using the MicroStrategy function plug-in technology. This neural network algorithm has been trained using the data from the data mart. Now that the metric has been created, validate it against the original data in a report like the one below, which compares the actual response with the response calculated by the predictive metric.

316 Data mining extensions example

2003 MicroStrategy, Inc.

Advanced Reporting Guide

Data Mining Extensions

16

This report shows that the metric correctly predicted 792 out of 842 responses, which corresponds to 94 percent accuracy. Of the 177 customers who responded positively, it correctly predicted 144 of them, or 81 percent. This accuracy is acceptable for marketing purposes. Finally, you can use the metric to predict the responses of the customers who were not used in developing the model, that is, the customers acquired on or after 10/1/2000. The report below shows that out of 600 new customers, 93 are likely to respond positively to a campaign similar to that of fall 2000:

Based on these results, the number of customers targeted can be greatly reduced, from 600 to 93. Costs for the campaign will decrease while positive responses will likely increase significantly.

2003 MicroStrategy, Inc.

Data mining extensions example

317

16

Data Mining Extensions

Advanced Reporting Guide

318 Data mining extensions example

2003 MicroStrategy, Inc.

You might also like