Professional Documents
Culture Documents
Data Mining
Data Mining
Introduction
Data mining is a broad term often used to describe examining a large amount of data to extract valuable information. The data mining process involves extending existing information to gain new insights. It includes analysis techniques such as regression, segmentation, clustering, and forecasting. All of these approaches have a common purposeto project new outcomes based on past experience by using predictive models. MicroStrategy data mining extensions help facilitate the development and deployment of these predictive models. Extending the MicroStrategy platforms powerful analytical, query, and reporting capabilities to products used to create predictive models promotes developing the potential of your data.
301
16
For example, data mining extensions can help with campaign management. Your company wants to improve the effectiveness of its marketing campaigns, with the goals of reducing costs and increasing the percent of positive response. You gather data about the customers targeted for past campaigns. This includes information such as age, gender, income, education, household size, and whether they responded positively or negatively to the campaign. Next, you develop a MicroStrategy report to generate the data set, which is then analyzed to determine if positive responders shared any factors. Once the predictive factors are identified, you create a MicroStrategy metric that embodies this predictive model. This predictive metric is used to establish the audience that should be targeted in similar future campaigns. The metric can forecast who is likely to respond positively, thereby lowering direct marketing costs and increasing effectiveness. Data mining extensions example at the end of this * The chapter elaborates on this campaign management example.
16
Each step is described in more detail below. While the step-by-step procedures for all processes that take place in MicroStrategy are included in the online help, a high-level process for each is included below.
The data mart can be easily created, refreshed, and accessed, even if it is large and complex. The database tables created by the data mart report are easily accessed by third-party data mining tools, since most of them access data using ODBC. This setup also promotes consistency between the data set used to develop the predictive model and the inputs to the predictive metric created from the imported predictive model.
303
16
The report sample below is an example of a data mart report for customer information. Note that not all columns or rows have been included, due to space constraints.
Notice that each attribute, such as Age Range, has two attribute forms on the reportthe description and the ID. Some data mining software works better using numbers, such as the ID, while the description is included for ease of use.
16
Only metrics can be used as predictors. The data mart report can contain attributes, which can be useful when you review the data mart. However, they cannot be used as predictors because inputs to the predictive metric can only be metrics. Neither attributes nor attribute forms can be used as inputs.
To use an attribute as a predictor, create a metric using the attribute. A shadow metric represents the attribute form to be included in the model. It allows you to use an attribute as a predictor. For more information, see the Shadow metrics section below.
The metric level must match the attributes used on the rows of the data mart report. Attributes on the rows of the data mart add dimensionality to the data. If a metric is used in the predictive model without any dimensionality (also known as the level), its results change based on the attributes of the report using the predictive metric. Creating another type of shadow metric, one that sets the metric level, resolves this problem. For more information, see the Dimensional shadow metrics section below.
If you need to group a metrics results by an attribute, use a filter. The filter allows you to use an attribute to qualify metrics. For example, you could display customer revenue by payment method. For more information, see the Filtered shadow metrics section below.
Shadow metrics
A shadow metric allows you to use an attribute as a predictor in your predictive model. It represents the attribute form you want to include in the model. Attributes cannot be used as predictors in the data mart because the predictive metric accepts only metrics as inputs, not attributes or attribute forms.
305
16
Data mining analyzes demographic and psychographic information about customers, looking for attributes that are strong predictors. For example, your MicroStrategy project contains a Customer attribute with attribute forms for age, gender, and income. To use these attribute forms in a predictive model, you would create shadow metrics for them. A shadow metric for age would look like the following: Max(Customer@Age) {ReportLevel}
High-level process - Creating a shadow metric
1 Create a new metric using the required attribute. 2 Change the attribute form, as necessary. 3 Enable outer joins to include all data. 4 Create a metric column alias to automatically create the predictive metric when the predictive model is imported. 5 Save the metric, using the alias as the metric name.
do not need to set the level of these shadow * You metrics, because by definition they are calculated at the level of the attribute.
16
However, if you create a shadow metric set at the Customer level for the predictive model, the predictive metric on the final report calculates at the Customer level. The following table outlines the high-level process of creating such a dimensional shadow metric.
High-level process - Creating a dimensional shadow metric
1 Open the metric that requires dimensionality. 2 Add the necessary attributes as metric levels. 3 Enable outer joins to include all data. 4 Create a metric column alias to automatically create the predictive metric when the predictive model is imported. 5 Save the metric, using the alias as the metric name.
307
16
This report presents problems if it is used as a data mart report. Since multiple headers exist on each column, the data mart version moves Payment Method to the rows. The result is a separate row for each Payment Method, or five rows for each customer. Since the metric is called Revenue, it is difficult to differentiate between the different types of revenue. Creating a filtered metric for each Payment Type and using the metrics on the data mart report result in a data mart that is easier to use.
High-level process - Creating a filtered shadow metric
1 Create a filter for each of the necessary attribute elements. In the example above, they are Visa, Amex, Check, and so on. 2 For each attribute element, create a dimensional shadow metric, adding the filter for that attribute element.
The following report uses filtered shadow metrics, which are very useful for data mining.
16
1 Open KnowledgeSTUDIO. 2 Select File, then New Project to create a new project. The Project Wizard opens on the Specify the Compute Server Step 1 page and will guide you through the process.
309
16
3 Do not change the default of My Computer as the selected Compute Server. you do not have to change anything on this page, * Since you can click Skip this step in the future. 4 Click Next. The Specify the Project Name and Description - Step 2 page opens. 5 Enter a Name, Location, and Description for your project. 6 Click Next. The Getting Started - Step 3 page, which is the final page, opens. 7 Select Insert a Data Set. 8 Click Finish. The Insert Data Set Wizard opens on the Dataset Source - Step 1 page. 9 Select Copy and transform the data from source into minable form. 10 Select ODBC Import Driver from the Source Driver Type drop-down list. 11 Click Next. The ODBC Connection Setup - Step 2 page opens. 12 Click Browse to select the database that contains the data in your MicroStrategy project. 13 If necessary, enter login information for the selected database. 14 Click Next. The Table Selection - Step 3 page opens. 15 In the Selected Table box, select the table that contains your data mart. 16 Click Next. The Dataset Editor - Step 5 page opens.
16
19 Enter a name for the dataset. 20 After clicking Finish, the data set is imported. Now that the data set is available in the third-party data mining tool, you can begin the process of data exploration and discovery that exposes the predictive information the data contains. These data mining tools provide many ways of extracting information from the data, including decision trees, cluster analysis, and advanced statistics. Ultimately, the process should lead to a predictive model. As the actual data mining workflows and techniques can be sophisticated and vendor-specific, they are beyond the scope of this document. For more information about using a particular data mining tool, consult the tool's documentation or contact the third-party vendor. After the predictive model has been created, you will want to use it within the MicroStrategy Business Intelligence Platform. First, you must import it into your MicroStrategy project, which is the subject of the next section.
311
16
1 In KnowledgeSTUDIO, double-click the predictive model, in the left pane. 2 From the menu bar, select Insert, then Code Generation. The Code Generation dialog box opens. 3 Select XML as the Code Generation Code Type. 4 Click OK. KnowledgeSTUDIO generates the XML code for your model and adds it to the project as a Note. 5 From the menu bar, select File, then Save Text File As. The Save As dialog box opens. 6 Enter a name for the export file, using a .txt extension. 7 Select the directory in which to save the file. 8 Click OK. The XML is saved to the specified file and location.
16
313
16
If the input metrics could be identified from the names of the inputs to the model, the predictive metric was created automatically during the import process. For this to have occurred, you must have named the metric column when you created shadow metrics for the data mart report. If any of the required input metrics are not found, you are prompted to select the appropriate inputs.
16
sample data mart report for this example can be *A found in the Creating a data mart section. You want to use all of these attributes, except for customer name, as predictors in the predictive model. Therefore, you must create a shadow metric for each, since the predictive metric accepts only metrics as inputs. Some example shadow metrics for this report are shown below: Max([Customer Age Range]@DESC) {Customer} Max([Customer Age Range]@ID) {Customer} Max([Customer Education]@DESC) {Customer} Filter the data mart report to include only the customers acquired before October 1, 2000, since newer customers would not have been included in the fall 2000 campaign. Once the data mart report is complete, use it to generate the data mart. Using a third-party data mining tool, import and then analyze the data set represented by the data mart. Even though a large number of attributes were included, the resulting model includes only those attributes that were established as strong predictorsgender, age, education, and household count. Construct a predictive model and import its XML representation into MicroStrategy. The result of the importation is a predictive metric called ResponsePredictor, which incorporates those attributes. The predictive metric, which uses the DmxAngossNum function, is shown below:
315
16
The analytical function DmxAngossNum actually calls a third-party neural network algorithm, using the MicroStrategy function plug-in technology. This neural network algorithm has been trained using the data from the data mart. Now that the metric has been created, validate it against the original data in a report like the one below, which compares the actual response with the response calculated by the predictive metric.
16
This report shows that the metric correctly predicted 792 out of 842 responses, which corresponds to 94 percent accuracy. Of the 177 customers who responded positively, it correctly predicted 144 of them, or 81 percent. This accuracy is acceptable for marketing purposes. Finally, you can use the metric to predict the responses of the customers who were not used in developing the model, that is, the customers acquired on or after 10/1/2000. The report below shows that out of 600 new customers, 93 are likely to respond positively to a campaign similar to that of fall 2000:
Based on these results, the number of customers targeted can be greatly reduced, from 600 to 93. Costs for the campaign will decrease while positive responses will likely increase significantly.
317
16