Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

.otherwise(F.date_add(F.

col("Paid To Date Last Payment"), 1))) \

.withColumn("Payment Gap", F.datediff(F.col("Paid From Date"), F.col("Paid To Date Last Payment adj")))

It may be easier to explain the above steps using visuals. As shown in the table below, the Window
Function “F.lag” is called to return the “Paid To Date Last Payment” column which for a policyholder
window is the “Paid To Date” of the previous row as indicated by the blue arrows. This is then compared
against the “Paid From Date” of the current row to arrive at the Payment Gap. As expected, we have a
Payment Gap of 14 days for policyholder B.

For the purpose of calculating the Payment Gap, Window_1 is used as the claims payments need to be in
a chornological order for the “F.lag” function to return the desired output.

Table 3: Derive Payment Gap. Table by author

Adding the finishing touch below gives the final Duration on Claim, which is now one-to-one against the
Policyholder ID.

.withColumn("Payment Gap - Max", F.max("Payment Gap").over(Window_2)) \

.withColumn("Duration on Claim - Final", F.col("Duration on Claim - per Policyholder") - F.col("Payment Gap -


Max"))

The table below shows all the columns created with the Python codes above.

Table 4: All columns created in PySpark. Table by author

Payout Ratio
The Payout Ratio is defined as the actual Amount Paid for a policyholder, divided by the Monthly Benefit
for the duration on claim. This measures how much of the Monthly Benefit is paid out for a particular
policyholder.

Leveraging the Duration on Claim derived previously, the Payout Ratio can be derived using the Python
codes below.

.withColumn("Amount Paid Total", F.sum("Amount Paid").over(Window_2)) \


.withColumn("Monthly Benefit Total", F.col("Monthly Benefit") * F.col("Duration on Claim - Final") / 30.5) \
.withColumn("Payout Ratio", F.round(F.col("Amount Paid Total") / F.col("Monthly Benefit Total"), 1))

The outputs are as expected as shown in the table below. To show the outputs in a PySpark session,
simply add .show() at the end of the codes.

You might also like