Fast way to repeatedly solve many similar LPs/QPs in parallel

Question

I am running a simulation where I need to repeatedly solve a set of LPs or QPs with slightly different input parameters for a Model Predictive Control application. The problem is I need it to be fast, so exploiting some parallelization is probably required? Ideally in Python.

Here is the problem:

$$ \underset{u_{i,j,k}}{\text{min}} \quad J_{i,j} = \sum_{k=0}^{N=30} \Bigg( R_{i,j,k} u_{i,j,k} + x_{i,j,k}^\top Q_{i,j,k} x_{i,j,k} \Bigg) $$

subject to:

$$ x_{i,j,k+1} = A_{i,k} x_{i,j,k} + B_{i,k} u_{i,j,k} $$

$$ \underline{u}_{i,j,k} \leq u_{i,j,k} \leq \overline{u}_{i,j,k} $$ $$ \underline{x}_{i,j,k} \leq x_{i,j,k} \leq \overline{x}_{i,j,k} $$ $$ x_{i,j,0} = x_i^{(0)} $$ $$ i \in \{1, ..., 30\} $$ $$ j \in \{1, ..., 30\} $$

Note that $x^\top Q x$ could also be modeled as $Qx$ to keep the problem linear.

Some ideas I've had so far:

Treating the set as one big optimization problem: This works but is incredibly slow (using Pyomo and CPLEX and CVXPy). Maybe there is another library designed for fast repeated solving like this.
Using a GPU computing library like Pytorch and running gradient descent in parallel: We would convert the constrained LP/QP into a unconstrained NLP with logarithmic barrier functions. Essentially interior point method on the GPU. Could potentially be faster but could become a pretty complex implementation.
Multi-parametric programming/Explicit MPC: Probably the most optimal solution but I cannot find a method to solve for the explicit solution. The difficulty is that $R$ is a linear cost and can be negative. UPDATE: This is infeasible due to the problem size and time-varying state space matrices.

Any other ideas?

Just FYI. SGD implementations with Pytorch are for separable and unconstrained optimization problems. SGD is not appropriate for this problem. — Robert Bassett, Commented Jan 31, 2022 at 18:09
For accuracy, you should edit your post and title to indicate these are quadratic programs. — prubin, Commented Jan 31, 2022 at 19:05
I believe you need to change LPs to QPs in the title and 1st sentence. @prubin correctly referred to QPs rather than LPs in his answer. — Mark L. Stone, Commented Jan 31, 2022 at 21:14
Thanks, I had a few different iterations where the xQx was Qx instead. It is updated now. — Zach Lee, Commented Feb 1, 2022 at 13:13
Robert, I should have included that I added logarithmic barrier functions for violating constraints like the interior point method. I believe this is how convex NLP solvers like ipopt work? It could become a bit complex for a custom implementation though. — Zach Lee, Commented Feb 1, 2022 at 13:28

RobPratt · Accepted Answer · 2022-01-31 17:21:58Z

8

The OPTMODEL modeling language in SAS (disclaimer: I work at SAS) supports two features for solving independent optimization (LP or otherwise) problems concurrently:

The COFOR loop, which automatically parallelizes the solver calls.
The runOptmodel action supports BY-group processing, which automatically parallelizes everything (data processing, problem generation, solver calls, etc.) for problems that have the same structure but different data.

For LP and MILP, an alternative approach is to formulate as one big problem and use the METHOD=CONCOMP option in the decomposition algorithm.

All three features have Python interfaces.

answered Jan 31, 2022 at 17:21

RobPratt

33.1k2 gold badges47 silver badges89 bronze badges

$\begingroup$ This sounds perfect I'll check it out thanks. I knew there had to be some kind of parallelized library like this that I could not find. $\endgroup$
– Zach Lee
Commented Feb 1, 2022 at 13:18

Add a comment |

Robert Bassett · Accepted Answer · 2022-02-01 00:30:19Z

A very easy way to do this is to use multiprocessing alongside cvxpy. It won't be fastest possible, but since you want to stick to Python and avoid low level C/C++/Fortran code it's clear that you intend to leave some performance on the table for ease of implementation (and I don't blame you).

Here's an example of how to do it, taken from the cvxpy documentation. This solves a Lasso problem for various values of the penalty parameter, but the extension to LPs (or what looks like QPs in your post) should be clear.

As an added bonus, this only uses free and open source tools. You can optionally decide to link cvxpy to a commercial solver if you have one available.

import cvxpy as cp
import numpy
import matplotlib.pyplot as plt

# Problem data.
n = 15
m = 10
numpy.random.seed(1)
A = numpy.random.randn(n, m)
b = numpy.random.randn(n)
# gamma must be nonnegative due to DCP rules.
gamma = cp.Parameter(nonneg=True)

# Construct the problem.
x = cp.Variable(m)
error = cp.sum_squares(A @ x - b)
obj = cp.Minimize(error + gamma*cp.norm(x, 1))
prob = cp.Problem(obj)

from multiprocessing import Pool
# Assign a value to gamma and find the optimal x.
def get_x(gamma_value):
    gamma.value = gamma_value
    result = prob.solve()
    return x.value

# Parallel computation (set to 1 process here).
pool = Pool(processes = 1)
x_values = pool.map(get_x, gamma_vals)

Experimenting with the optimizer that cvxpy employs may be worthwhile. I am not sure what it uses by default. — ErlingMOSEK, Commented Feb 2, 2022 at 6:04

Zach Lee · Accepted Answer · 2022-02-02 14:42:33Z

After a good bit of experimentation based on the ideas posted, here was my solution:

Do as many matrix multiplications up front using pytorch on the GPU to simplify the problem. This means two things. First combine $x_{i,j,k+1} = A_{i,j,k} x_{i,j,k} + B_{i,j,k} u_{i,j,k}$ into a lower triangular block matrix equation containing all timesteps:

$$ \mathcal{X}_{i,j} = \mathcal{A}_{i,j} \mathcal{U}_{i,j}$$

where, \begin{equation*} \begin{bmatrix} x_0 \\ x_1 \\ ... \\ x_{N} \end{bmatrix} = \begin{bmatrix} I & 0 & ... & 0 \\ A_{i,j,0} & B_{i,j,0} & 0 & ... & 0 \\ A_{i,j,1} A_{i,j,0} & A_{i,j,1} B_{i,j,0} & B_{i,j,1} & 0 & 0\\ ... & ... & ... & ... & ... \end{bmatrix} \begin{bmatrix} x^{(0)}_{i,j} \\ u_{i,j,0} \\ ... \\ u_{i,j,N-1} \end{bmatrix} \end{equation*}

However, $Q_{i,j,k}$ is very sparse, meaning that only a few $x$ values matter. Therefore we can multiply the entries of $\mathcal{A}_{i,j}$ by a block matrix $\mathcal{C}$ that masks each entry and reduces the dimension by a factor of ~10:

$$ \mathcal{Y}_{i,j} = \mathcal{C} \mathcal{A}_{i,j} \mathcal{U}_{i,j}$$

This essentially performs all of the time-based matrix multiplications up front on the GPU before building the model.

Use Gurobi's python API directly instead of a third party library like Pyomo (if possible). This speeds up model building and allows the use of the multiscenario interface. To prevent rebuilding the model if certain conditions are met. I think this functions similar to a warm start.

Result: A 1000x speedup from my original Pyomo+Gurobi implementation!

$\begingroup$ Nice job on the sparsity $\endgroup$
– Robert Bassett
Commented Feb 2, 2022 at 15:40 — Robert Bassett, Commented Feb 2, 2022 at 15:40

prubin · Accepted Answer · 2022-01-31 19:11:34Z

5

If I understand this correctly, you are solving 900 QPs (one for each combination of $i$ and $j$), tweaking the parameters, then solving all 900 again (and again). One possibility to try would be hot-starting each problem (after the initial round of 900) using the solution to the problem from the previous round with the same $i,j$ combination. This would require storing and reading the 900 current solutions. Whether hot starting helps or not depends on how much the models change (and, frankly, chance).

answered Jan 31, 2022 at 19:11

prubin♦

39.8k3 gold badges40 silver badges107 bronze badges

$\begingroup$ I've played around with warm starts without a ton of success unfortunately... You are correct and I had the same intuition. I'm sure there is a way to get this speed-up so maybe I'll work on improving my implementation $\endgroup$
– Zach Lee
Commented Feb 1, 2022 at 13:42

Add a comment |

Utkarsh Detha · Accepted Answer · 2022-02-25 10:34:33Z

I suggest you consider the Parameterized Fusion API for MOSEK (available in Python). You can use it to construct your model without passing actual data for the parameter values, and then set the parameter values before solving. Over repeated solves, this can potentially save you a lot of time. While constructing some examples here at MOSEK, I have often seen Parameterized Fusion take about a fifth of the time as compared to Fusion. This was naturally for problems where constructing the model takes a significant amount of time (relative to solving it).

Additionally, if you make your problem linear (as you mentioned), then you can use the warm-start functionality of the simplex solver. That should also make a huge difference. Again, in a particular instance, I saw the total time be reduced to a fifth of what it would be without warm-start.

Stack Exchange Network

Fast way to repeatedly solve many similar LPs/QPs in parallel

5 Answers 5

Not the answer you're looking for? Browse other questions tagged
optimization
cplex
convex-optimization
parallel-computing
optimal-control
or ask your own question.

Hot Network Questions

Fast way to repeatedly solve many similar LPs/QPs in parallel

5 Answers 5

Not the answer you're looking for? Browse other questions tagged optimizationcplexconvex-optimizationparallel-computingoptimal-control or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
optimization
cplex
convex-optimization
parallel-computing
optimal-control
or ask your own question.