Buy Till You Die(BTYD) Model for Customer Life Time Value Calculation

Mahesh Divakaran
5 min readApr 26, 2019

--

Calculating customer lifetime value is complex, and the use of familiar regression-type models — which attempt to forecast future behavior based on only observable measures — is problematic and inadequate. A better approach is to perform the calculations using a probability model of buyer behavior, in which observed behavior is viewed as the outcome of a random process governed by latent characteristics. Companies that are serious about valuing their customer base must embrace this unconventional yet superior method.

Problems with — Traditional Approaches (Simple RFM and Regression)

First, the regression type models are ad hoc; there is no well-established theory. Explanatory variables (including demographics, marketing variables, and behavioral measures such as RFM) are often added into the model simply because they lead to a higher R-squared. There is generally no compelling “story” behind many of these measures and their relationships with CLV.
Without a basis to justify the particular relationships uncovered by the model, it is hard to have faith in it to make predictions.
Second, regression models (and other forms of “data-mining” procedures) are designed to predict behavior in the next period. But when computing CLV, we are not only interested in period 2, but also need to predict behavior in periods 3, 4, 5, and so on.
As we move further away from the period for which we have actual values for the predictor variables, it becomes increasingly difficult to derive the expected value of the dependent variable. Long-term forecasts will be highly unreliable as a result.
Our third concern is the developers of these models ignore the fact that the observed RFM variables are only imperfect indicators of underlying behavioral characteristics; they are not fixed variables such as demographics.

The Alternative Approach(BTYD Models)

We have only a “foggy window” as we attempt to see our customers’ true behavioral tendencies, and therefore the past is not a perfect mirror of the future. Contrasting this two-step approach (θ ˆ = f(past) and future = f(θ ˆ )) with the single-step regression model (future = f(past)), we find that the use of a formal probability model avoids all the shortcomings associated with regression-type models. First, there is no need to split the observed transaction data into two periods to create a dependent variable; we can use all of the data to make inferences about the customer’s behavioral characteristics. Second, we can predict behavior over future time periods of any length; we can even derive an explicit expression for CLV over an infinite horizon.

Our Assumptions

In this model, we first assume that the amount spent per transaction is independent of the transaction process. This means our model of buyer behavior can be separated into a sub-model for the flow of transactions and a sub-model for revenue per transaction.
Our model for the transaction stream is based on the following assumptions:
• A customer’s relationship with the firm has two phases: He or she is “alive” for an unobserved period of time, and then becomes permanently inactive. But this time of inactivity need not be a short- or medium-term occurrence; in some cases, it might not arise at all during the customer’s physical lifetime.
• While alive, a customer “randomly” purchases around his or her mean transaction rate.
• Both the transaction rates and dropout rates vary across customers.
Our model for the spending process is based on the following assumptions:
• The dollar value of a customer’s given transaction randomly varies around his or her mean transaction value.
• Mean transaction values vary across customers but do not vary over time for any given individual.

Recall that CLV is defined as the present value of the future cash flows associated with a customer. A consequence of our assumption that monetary value is independent of the underlying transaction process is that the net cash flow per transaction can be factored out of the calculation, which means we focus on forecasting the “flow” of transactions (discounted to yield a present value). This number of discounted expected transactions (DET) can then be rescaled by a net cash flow “multiplier” to yield an overall estimate of expected CLV: E(CLV) = E(net cash flow/transaction) × DET
This decomposition offers two significant benefits.
First, it breaks down and simplifies the computational steps associated with the model.
Second, it offers diagnostic benefits that can assist a firm in identifying problem areas and determining how to allocate marketing resources to address them.

An Illustrative Application

Someone who made seven repeat purchases with the last one occurring in week 35 (frequency = 7, Recency = 35) has an approximate CLV of $10, the same as someone with a single repeat purchase that occurred in week 20 (frequency = 1, Recency = 20). In general, for people with low Recency, the higher frequency seems to be a bad thing. Initially, this might seem like a mistake in the model, but upon further reflection, it starts to make sense. If we knew for sure that both customers were still active in week 78, then we would expect the customer who made seven repeat purchases to have a greater CLV, in light of his or her higher number of past purchases. However, his or her RFM profile (specifically, high frequency but low Recency) suggests that — most likely — he or she is no longer active at the end of week 78. On the other hand, the second customer has a lower underlying purchase rate, so it is reasonably likely that he or she is still active by week 78, even though he or she hasn’t made a purchase for the past 58 weeks. The net effect is that both customers are estimated to have the same CLV, despite their very different past purchase histories.
The backward bending iso-value curves emphasize the importance of using a model with sound behavioral assumptions, instead of an ad hoc regression-type model that would probably miss this pattern and lead to faulty inferences for a large portion of the Recency-frequency space.

References:

  1. https://faculty.wharton.upenn.edu/wp-content/uploads/2013/08/fader_et_al_mr_06.pdf

--

--

Mahesh Divakaran
Mahesh Divakaran

Written by Mahesh Divakaran

IBM Certified Data Science Professional || SAS Base 9.4 Certified Professional || Statistical Programmer at Genpro Research