I am solving a mixed-integer program whose decision variables are $x \in \{0, 1\}^n$ and $y \in \mathbb{R}^m$, where $0 \leq y_j \leq u_j$ for constant upper bounds.
My primary objective function is of the form $$ \text{minimize } \sum_{i=1}^n c_i x_i $$
where each $c_i > 0$.
However, under the constraints for this problem, it turns out that there are many equivalent optimal solutions for this objective, and the solver typically finds an optimum quickly and then spends a long time iterating on dual bounds to prove its optimality.
To compensate for this, I want to to use an objective of the following form as a "tiebreaker"—that is, of the set of solutions that minimize $c^T x$, I want to select the one that minimizes $$ \sum_{j=1}^m d_j y_j $$ where each $d_j > 0$. (Actually, all I really care about is the 1-norm sparsity of $y$, so you can assume $d_j = 1$ if that helps at all.)
At a first glance, one way to accomplish this optimization objective would be as follows:
- Get the solution that minimizes $c^T x$ and let $z$ denote the associated objective value.
- Solve the MILP again, but change the objective function to $d^T y$ and impose the constraint $c^T x \leq z$.
I have two problems with this approach:
- In step 1, we know from experience that it takes the solver a long time to prove optimality with respect to $c^T x$, and we are not exploiting our secondary preference to accelerate this process.
- In step 2, it would be ideal to reuse the optimal solution from the previous step as a warm start, but I am required to use Google's OR-Tools (which doesn't provide a warm start API), so this step will take a long time.
So, I am considering adopting the following approach instead:
- Let $\bar c = \min \{c_i - c_j : c_i - c_j > 0\}$ equal the smallest positive difference among the $c_i$. This represents a lower bound on the difference between $z$ and the objective value of any feasible solution that is not optimal.
- Let $\hat d = \sum d_j$ and $\hat u = \sum u_j$.
- Solve the whole thing in one shot by minimizing $$\sum_{i=1}^n c_i x_i + \frac{\bar c}{\hat d \hat u}\sum_{j=1}^n d_j y_j .$$
The constants in the the second term confine its value to the range $[0, \bar c]$, meaning that the first term must be optimized before the second has any effect.
The risk of this approach is that for large $m$, the coefficients on the $y_j$ will be very small relative to the $c_i$, and the problem may be poorly conditioned.
My question is, is this approach valid, and is it possible to mitigate the risk of numerical instability for large $m$?
Edit: I don't think the approach is valid to begin with: If $c_1 = 2$, $c_2 = 0.99$, and $c_3 = 0.99$, then we get $\bar c = 1.01$, but the solutions $x = (1, 0, 0)$ and $x = (0, 1, 1)$ differ in objective value by only $0.02 < \bar c$ (assuming they are feasible).
The question of how to do this lexicographic optimization in a way that avoids the problems with the $c^T x \leq z$ approach stands.