Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

Cumulative accuracy profile (CAP) curve is used to relatively assess the accuracies of different

models with Targeting approach


 The CAP of a model represents the cumulative number of positive outcomes along the y-axis
versus the corresponding cumulative number of a classifying parameter along the x-axis
o Total observations Vs Class 1 (1’s) observations
 compares a model with a perfect classification model and a random classification model
o perfect CAP (maximum number of positive outcomes is achieved directly)
o random CAP ( positive outcomes are distributed equally)
 a better model tends to the perfect CAP
 The CAP Curve tries to analyse how to effectively identify all data points of a given class using
minimum number of tries.

The process for plotting and subsequently analysing the CAP curve is as follows:
Step 1: Calculate the total number of observations and plot it on the X-axis
Step 2: Calculate the max number of positives possible (actual positives) and plot on the Y axis
Step 3: Plot the random graph has values predicted linearly throughout the graph
Step 4: Plot the best-case or perfect graph detects all positive outcomes (1) in same number of tries
as positive datapoint counts
Step 5: Plot your model(s) on the graph and select the one that is the closest to the best-fit curve

CAP Analysis Using Area under the Curve


1> Calculate the area under the perfect
model (aP) till the random model (a)
2> Calculate the area under the prediction
model (aR) till the random model (a)
3> Calculate Accuracy Rate (AR) = aR / aP
4> The closer the Accuracy Rate is to the 1,
better is the model.
Using a Plot
1>Draw vertical line at 50% on X-axis
crossing the model plot and horizontal line
at intersection
2. Calculate the percentage of class 1
identified with respect to the total count of
class 1 labels.
3> <60% Bad model 80-90% good model

Example : Maximize click through rate of advertisement


In a random case : If 100 clicks are found in a DB of 1000 customers then a straight line will
randomly distribute the occurrence (1 click per 10 reach outs, 10 clicks in 100 customer reach outs)
with Targeting : The model should target the all the most likely to click customers in minimum
number of reach outs using prediction. I.e. 100 reach outs, 100 clicks
 Your model cannot improve the overall performance of your campaign. If only 10% people
will click your advertisement, then model should ensure that the first 10% people that you
reach out to all respond positively (predict which 10% of the population to target)
Case: Model A, predicted 80% of the actual positives in the first 100 observations, Model B,
predicted 80% of the actual positives in 200 observations. Which of the two is closer to the best-fit
curve?

You might also like