0
$\begingroup$

I'm trying to build this CLTV model for customers coming to purchase products over time but I'm new to CLTV, so got some questions to clarify:

  1. Since each customer was acquired in different time point, what does the modeling data look like? Should I include all previous transactions for each customer? The data includes transaction time back in 2013. This is my current idea: first to scope out the customers we want to include for modeling, my approach is to include customers who at least made one purchase during 2020 and 2022 recent years to make sure at least they are somewhat active; and now we have a list of customers, I'll go to transaction table to get ALL transactions for each of those customers no matter how far it dated, to create features like recency, frequency, average value per order and so on. this will become my feature table. My target would be the actual sales amount in one year like 2023 for each customer, this target table will be left joined to feature table. This will result in 0 2023 sales amount for some customers, in this case, I'll use tweedie distribution to predict total sales amount.
  2. Does my approach make sense? and for target table, is total sales amount in certain time period meaning CLTV? I found there are various definitions of CLTV: some include multiplication among average order value, frequency and time span, some include churn rate, retention rate, gross margin...If CLTV really has these different components, should I model them separately then aggregate together? another reason I chose tweedie is it's a good balance of frequency and severity (monetary), so I don't need to build 2 models for each. but how can I count in "time span" factor in the model?
$\endgroup$

0

Browse other questions tagged or ask your own question.