Business Analytics

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

Business Analytics

February 18, 2012

Supervised Algorithm Unsupervised Algorithm

Training Partition Validation Partition

Sample = 1000 600 Observations= test data= seen data create train the model 400 Observations= for validating the goodness of fit

Unsupervised Learning:

Here, we dont have any outcome of interest. For e.g- Surf Excel buyers probability of buying cloth brush

Steps in Data Mining: 1. 2. 3. 4. 5. 6. 7. 8. Develop an understanding of the purpose Obtain the dataset to be used in the analysis Explore, clean, and preprocess the data Reduce the data if necessary Determine the data mining task (classifications, prediction, clustering) Choose the data mining techniques to be used Use algorithms to perform the task Interpret the results

Need for predictive models

Y = f(Xi)

where Xi = X1, X2, X3, X4

Data Mining for Business Intelligence Galit Shmueli Nitin Patel Peter Bruce

Nave Rule- classifies all customers as being truthful because 90% of the companies investigated in the training set were found to be truthful. Nave rule is used mainly as a baseline for evaluating the performance of more complicated classifiers

Nave Bayes: The probability of a record belonging to a certain class is evaluated not only based on the prevalence of that class but also on the additional information. Nave Bayes works only with predictors that are categorical Numerical predictors must be binned and converted to categorical variables

Nave Bayes Method: Classification of observations based on cut-off probability

[ Fraud / Charges = y, size= small]

= P [Charges = Y, Size = small] p[Fraud] __________________________________ P [Charges= Y, Size= small /fraud] p [fraud] + P [C =Y, Size= small/Truth] P [Truth]

You might also like