2
$\begingroup$

Recently, I found out why (most) Integer Programming and Mixed Integer Programming optimization problems are usually considered as Non-Convex. This is because even though the objective function in such problems (e.g. Traveling Salesman Problem is a Convex Function - since the set of constraints that this objective function has to be optimized over is Non-Convex (e.g. the set of Integers by definition is a Non-Convex set), the entire problem becomes a Non-Convex problem.

This being said, I have the following question:

1) In the case of the Traveling Salesman Problem, if we look at the "objective function" in this problem:

enter image description here

Is this function $\sum_i \sum_j c_{ij}x_{ij}$ differentiable with respect to its constraints? Can we take its derivative with respects to its constraints?

My inclination is "no" - this function is non-differentiable, seeing how I have never seen the Traveling Salesman Problem being solved using an optimization algorithm that requires evaluating a derivative (e.g. gradient descent), and instead always solved using optimization algorithms that do not require evaluating the derivative (e.g. Branch and Bound, Evolutionary and Metaheuristic Algorithms).

2) Consider the "Decision Tree" statistical/machine learning model (e.g. CART https://en.wikipedia.org/wiki/Decision_tree_learning) that is often used for supervised classification and regression tasks.

Given a dataset, a Decision Tree (e.g. CART) is formed by "splitting" variables into smaller subsets (i.e. "nodes") such that "purity" increases in each subsequent subset; "purity" is often measured through some sort of "Information Gain" that is based on measures such as "Gini Index" or "Entropy". Thus, Decision Trees can be interpreted as an optimization problem where "Information Gain" has to be optimized. I have heard that since Decision Trees often have different variable types (e.g. continuous and categorial), searching for the optimal variable splits that optimize "Information Gain" is a Mixed Integer Optimization Problem having an enormous Combinatorial Search Space. For the interest of creating a decent Decision Tree in a reasonable amount of time, "Information Gain" is optimized using a "Greedy Search Algorithm," and as a result, the final Decision Tree (i.e. the answer to this Mixed Integer Optimization Problem) is almost certainly unlikely to be the optimal Decision Tree (as there is very high probability that a better Decision Tree likely exists in this large Discrete Combinatorial Search Space, but finding this Decision Tree would take too much time):

enter image description here

The above equation is the general function for "Information Gain" - is the function differentiable with respect to its constraints (I am guessing that the "constraints" in this case refer to Decision Trees being unable to make "illogical splits" that contradict "previous splits" - or instances in which the user specifies a minimum number of observations to be contained within each split)?

Again, my inclination is "no" - this function is not differentiable, because I have never seen any references that show this optimization problem being solved using an optimization algorithm that requires evaluating the derivative (e.g. Gradient Descent), and significant research is being done on improving optimization algorithms for this problem, and these optimization algorithms do not involve derivatives (e.g. Branch and Bound https://arxiv.org/pdf/1904.12847.pdf , Evolutionary Algorithms https://cran.r-project.org/web/packages/evtree/vignettes/evtree.pdf)

Thus, my rationale as to why these above functions are non differentiable are unfortunately non-mathematical reasons, but rather anecdotal reasons.

  • But in general, is relatively straightforward to establish that "all functions defined over Non-Convex sets are by definition non-differentiable"?

  • Could we technically still use "Gradient Descent" on problems like Traveling Salesman and Decision Trees (e.g. "Relax" the problem, use Gradient Descent, check to see if the solution falls within the feasible region, Explore for solution in neighborhood, Repeat, etc.), with the only problem being that it would be an extremely inefficient compared to Gradient-Free Optimization Methods (e.g. Metaheuristics/Evolutionary Algorithms, Branch and Bound)? Or is it by definition, we are mathematically unable to use Gradient Descent on problems like Traveling Salesman and Decision Trees?

Note: I have heard that Gradient Descent on Traveling Salesman is like "Nearest Neighbor Search", yet I do not fully understand why this is. I am guessing that maybe you can use Gradient Descent on Traveling Salesman - but it would be extremely ineffective as it would provide no "useful" information on which path to explore next?

$\endgroup$
5
  • 3
    $\begingroup$ "all functions defined over Non-Convex sets are by definition non-differentiable" > No. Consider $f: x \to x$ defined on $]-\infty, -1] \cup [1, +\infty[$. It's defined on a non-convex set and is differentiable $\endgroup$
    – fontanf
    Commented Mar 1, 2022 at 7:22
  • 2
    $\begingroup$ I think you might be asking whether discrete calculus has ever found applications in Operational Research. To the best of my knowledge, the answer is no... at least not in a significant way. $\endgroup$ Commented Mar 1, 2022 at 10:19
  • 3
    $\begingroup$ I disagree with the assertion that problems with discrete domains are automatically nonconvex. When we discuss convexity in this context, we typically are referring to convexity of the continuous relaxation. So an integer linear program will be both discrete (due to integrality) and "convex" (because the constraints and objective are linear). $\endgroup$
    – prubin
    Commented Mar 1, 2022 at 17:54
  • $\begingroup$ Thank you everyone for your answers! If this is the case, could someone please explain why we don't use optimization algorithms like gradient descent on travelling salesman? This is probably an obvious question.... but I think I am missing something. Thanks! $\endgroup$
    – stats_noob
    Commented Mar 2, 2022 at 14:14
  • $\begingroup$ @stats_noob gradient descent is for unconstrained problems and continuous variables. How would you apply it to the TSP? For discrete problems, it's possible to use similar ideas inside a local search framework en.wikipedia.org/wiki/Hill_climbing $\endgroup$
    – fontanf
    Commented Mar 2, 2022 at 14:21

2 Answers 2

8
$\begingroup$

I'll address question (1) only. The objective function of the TSP formulation is linear. It is therefore differentiable. But the feasible region consists of a set of discrete points, so the objective function is not differentiable over the feasible set.

And even if the feasible region were continuous—for example, if you were solving the LP relaxation of the IP formulation—differentiability doesn't help us. That's because the objective function itself has no local minima (because it is linear). So if you take a derivative of the objective function and set it to 0, there will be no solutions. (Try it—it's easy to see why.) The optimal solutions of the LP occur at corner points of the feasible region, not at stationary points of the objective function. This is true not just for the LP relaxation of the TSP, but for any LP.

In fact, the simplex method for solving LPs can be viewed as a sort of gradient descent method—see https://math.stackexchange.com/q/3254423/663638.

$\endgroup$
5
$\begingroup$

Derivatives can be very important for solving nonconvex problems, especially Mixed Integer NonLinear Programs (MINLP). A standard approach in Operations Research for solving a nonconvex problem is to form a convex hull of the feasible region, e.g., drop the integralility requirements. Then

  1. Solve the convex relaxation, typically using derivatives. Generally, this is a relatively easy problem and produces a global optimum to the relaxed problem.

  2. Pray that the solution to the relaxed problem is also a solution (feasible) to the original nonconvex problem.

    If yes, we are done,

    If no, split the original problem into two, hopefully easier, problems and repeat.

This is the essence of Branch-and-Bound, the workhorse algorithm for solving non-convex problems.

$\endgroup$
1
  • $\begingroup$ @ LINDO Systems: Thank you for your answer! If this is the case, could you please explain why we don't use optimization algorithms like gradient descent on travelling salesman? This is probably an obvious question.... but I think I am missing something. Thanks! $\endgroup$
    – stats_noob
    Commented Mar 2, 2022 at 14:16

Not the answer you're looking for? Browse other questions tagged or ask your own question.