Professional Documents
Culture Documents
Business Analytics
Business Analytics
Business Analytics
Sample = 1000 600 Observations= test data= seen data create train the model 400 Observations= for validating the goodness of fit
Unsupervised Learning:
Here, we dont have any outcome of interest. For e.g- Surf Excel buyers probability of buying cloth brush
Steps in Data Mining: 1. 2. 3. 4. 5. 6. 7. 8. Develop an understanding of the purpose Obtain the dataset to be used in the analysis Explore, clean, and preprocess the data Reduce the data if necessary Determine the data mining task (classifications, prediction, clustering) Choose the data mining techniques to be used Use algorithms to perform the task Interpret the results
Y = f(Xi)
Data Mining for Business Intelligence Galit Shmueli Nitin Patel Peter Bruce
Nave Rule- classifies all customers as being truthful because 90% of the companies investigated in the training set were found to be truthful. Nave rule is used mainly as a baseline for evaluating the performance of more complicated classifiers
Nave Bayes: The probability of a record belonging to a certain class is evaluated not only based on the prevalence of that class but also on the additional information. Nave Bayes works only with predictors that are categorical Numerical predictors must be binned and converted to categorical variables
= P [Charges = Y, Size = small] p[Fraud] __________________________________ P [Charges= Y, Size= small /fraud] p [fraud] + P [C =Y, Size= small/Truth] P [Truth]