0
$\begingroup$

The company is interested in identifying profitable customers who are likely to purchase a ticket when given a promotional offer. My goal is to build a model to predict whether a customer will buy a ticket, and specifically to improve recall and precision for the minority class to maximize profits. there are 9% buyers and 91% non buyers (imbalanced target variable)

Dataset Description: The dataset contains the following features:

MARKETING_SCORE
STATUS_PANTINUM
STATUS_GOLD
STATUS_SILVER
NUM_DEAL
LAST_DEAL
ADVANCE_PURCHASE
CALL_FLAG
CREDIT_PROBLEM
RETURN_FLAG
BENEFIT_FLAG
AVG_FARE
AVG_POINTS
BUYER_FLAG: Target variable (1 if the customer bought the ticket, 0 otherwise)

I used both correlation analysis and random forest importance to select features. The most important features based on random forest were:

MARKETING_SCORE
ADVANCE_PURCHASE
LAST_DEAL
NUM_DEAL
CALL_FLAG
CREDIT_PROBLEM
STATUS_SILVER
BENEFIT_FLAG
STATUS_GOLD
RETURN_FLAG
STATUS_PANTINUM

I first trained a logistic regression model with the important features but it did not perform well. I then used XGBoost with parameter optimization, After training the final XGBoost model and calculating the optimal cutoff, I evaluated it on the training and test sets. The results were as follows:

training

             precision    recall  f1-score   support

           0       0.94      0.88      0.91     27265
           1       0.29      0.47      0.36      2735

    accuracy                           0.85     30000
   macro avg       0.62      0.68      0.64     30000
weighted avg       0.88      0.85      0.86     30000

test

              precision    recall  f1-score   support

           0       0.94      0.88      0.91      9088
           1       0.27      0.42      0.33       912

    accuracy                           0.84     10000
   macro avg       0.60      0.65      0.62     10000
weighted avg       0.88      0.84      0.86     10000

How can I further improve the recall and precision for the minority class (buyers) to maximize the expected profit? Any insights or suggestions would be greatly appreciated. Im a student and still learning :)

$\endgroup$
1
  • 2
    $\begingroup$ If you want to maximize profits, why are you trying to maximize precision and recall for some cutoff value? $\endgroup$ Commented Jun 28 at 20:17

0