Improving Recall and Precision of the Minority Class with XGBoost to Maximize Profits in Unbalanced Data

Ask Question

Asked 26 days ago

Modified 26 days ago

Viewed 22 times

The company is interested in identifying profitable customers who are likely to purchase a ticket when given a promotional offer. My goal is to build a model to predict whether a customer will buy a ticket, and specifically to improve recall and precision for the minority class to maximize profits. there are 9% buyers and 91% non buyers (imbalanced target variable)

Dataset Description: The dataset contains the following features:

MARKETING_SCORE
STATUS_PANTINUM
STATUS_GOLD
STATUS_SILVER
NUM_DEAL
LAST_DEAL
ADVANCE_PURCHASE
CALL_FLAG
CREDIT_PROBLEM
RETURN_FLAG
BENEFIT_FLAG
AVG_FARE
AVG_POINTS
BUYER_FLAG: Target variable (1 if the customer bought the ticket, 0 otherwise)

I used both correlation analysis and random forest importance to select features. The most important features based on random forest were:

MARKETING_SCORE
ADVANCE_PURCHASE
LAST_DEAL
NUM_DEAL
CALL_FLAG
CREDIT_PROBLEM
STATUS_SILVER
BENEFIT_FLAG
STATUS_GOLD
RETURN_FLAG
STATUS_PANTINUM

I first trained a logistic regression model with the important features but it did not perform well. I then used XGBoost with parameter optimization, After training the final XGBoost model and calculating the optimal cutoff, I evaluated it on the training and test sets. The results were as follows:

training

             precision    recall  f1-score   support

           0       0.94      0.88      0.91     27265
           1       0.29      0.47      0.36      2735

    accuracy                           0.85     30000
   macro avg       0.62      0.68      0.64     30000
weighted avg       0.88      0.85      0.86     30000

test

              precision    recall  f1-score   support

           0       0.94      0.88      0.91      9088
           1       0.27      0.42      0.33       912

    accuracy                           0.84     10000
   macro avg       0.60      0.65      0.62     10000
weighted avg       0.88      0.84      0.86     10000

How can I further improve the recall and precision for the minority class (buyers) to maximize the expected profit? Any insights or suggestions would be greatly appreciated. Im a student and still learning :)

asked Jun 24 at 13:58

ster111

2

$\begingroup$ If you want to maximize profits, why are you trying to maximize precision and recall for some cutoff value? $\endgroup$
– picky_porpoise
Commented Jun 28 at 20:17

Add a comment |

Stack Exchange Network

Improving Recall and Precision of the Minority Class with XGBoost to Maximize Profits in Unbalanced Data

0

Browse other questions tagged
classification
xgboost
class-imbalance
hyperparameter-tuning
or ask your own question.

Hot Network Questions

Improving Recall and Precision of the Minority Class with XGBoost to Maximize Profits in Unbalanced Data

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Browse other questions tagged classificationxgboostclass-imbalancehyperparameter-tuning or ask your own question.

Related

Hot Network Questions

Browse other questions tagged
classification
xgboost
class-imbalance
hyperparameter-tuning
or ask your own question.