Say for example, I built a classification model for a mailing campaign that will be applied to 1M records. The positive class for the model would be customers and the negative records would be non-customers.
For the mailings, we select the top 100,000 scoring (highest predicted probabilities) records. We are expecting a low response rate but we want to make sure we send mail pieces to all of the highest predicted prospects.
We bin the model scores into deciles and only select from decile 1 (highest scoring) which we'll say is >= 0.10 predicted probability. The model has a 95% recall and 5% precision at this threshold.
Does it make sense in this scenario to get the highest recall possible? Or would precision be important here and I should optimize on that or just optimize them both together.
The goal is to get the most customer conversions within that 100k and we already know there will be a good amount of false positives but we want as many true positives as possible.