Professional Documents
Culture Documents
Objective Week1 6 July 2012
Objective Week1 6 July 2012
Customer Retention
Prakhar Deep Gupta
Business Objective/Problem
Cross selling to improve revenue per customer. Cross selling a way to improve relationship with customer and improve customer retention Identify the customers with the potential for cross selling opportunities.
Data Inputs
o o o o o o o Transactional data of customers obtained from POS Resident of which state. Annual Income No. of dependents Credit Class/Category No. of Transactions per month Average balance per month
Page 2 of 7
Page 3 of 7
Logistic Regression
The modeling data is used initially in the Univariate Logistic Regression analysis to get an estimate of the number of significant factors that affect the cross sell probability. After this step, the data set is used for the Multivariate Regression analysis for determining the set of final significant factors using pseudo R square values as well as p-values. Once the significant factors are identified, the model thus generated is fed with the validation data set and the subsequent ROC curves are studied for determining the predictive power of the model. The final step includes input of the predication data set and the model provides the predicted cross sell probabilities. As a final step, cluster analysis is performed on the Logistic Regression model output for segmentation.
Page 4 of 7
We fail to reject the null hypothesis and conclude that the regression coefficient for the given factor has not been found to be statistically different from zero in estimating cross selling opportunities given the presence of other factors in the model. (Pr>ChiSq) <=0.05 We reject the null hypothesis and conclude that the regression coefficient for the given factor has been found to be statistically different from zero in estimating cross selling opportunities given the presence of other factors in the model.
Conclusion:
1. The above analysis shows that with the exception of condition of account (condition_of_accnt)shown in Exhibit 4 have a significant bearing on the final outcome 2. Also, the credit limit has a Pr>ChiSq value of 0.048 which is very close to being an insignificant factor, sowing that the credit limit increase may not necessarily translate into higher spending by the consumer.
Validation:
The technique is already in use and plenty of research articles have been written on the same. Also, since the final output shows a high degree of correlation, it shows that the factors identified are correct and the data can be used for predictive analysis.
Page 5 of 7
Project Outputs
Logistic Regression
List of significant factors, Area under ROC curve The set of predicted probabilities for each and every customer.
CHAID
The visual representation of the tree, The ROC and Lift curves The final set of prospects/leads from amongst the customers. A matrix set popularly known as the confusion matrix is also obtained which represents the proportion of misclassified items and is an indicator of the model efficacy and predictive power.
Page 6 of 7
References
http://arxiv.org/ftp/arxiv/papers/1002/1002.1144.pdf IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 1, No. 1, January 2010 http://www.nesug.org/proceedings/nesug98/solu/p095.pdf
Page 7 of 7