1
$\begingroup$

My current research project involves adaptive weights for three different loss functions so that I hope each the objective can focus on the different size of objects when given a different size of the input.

Say there are three ranges: $0-300$ is for small objects, $300-900$ is for middle-sized objects and $>900$ is for large objects.

My current design: let's assume input is $x$, $$ \begin{align*} y_1&=\frac{1}{1+\exp(-0.01(x-600))}\\ y_2&=\frac{1}{1+\exp(+0.01(x-600))}\\ y_3&= \frac{1}{1 + \exp(-0.02*(x-300)))}+ \frac{1}{1 + \exp(0.02*(x-900))}-1 \end{align*} $$

It gives that enter image description here

However, the problem is $\sum_{i=1,2,3} y_i[x]\neq 1, \forall 0\leq x$. A simple solution to fix is to design two piece-wise functions: $$ \begin{align*} y_1&=\frac{1}{1 + \exp(-0.02(x-300)))}+ \frac{1}{1 + \exp(0.02(x-900))}-1\\ y_2&=\frac{1}{1- \exp(+0.02(x-300))}+ \frac{1}{1 + \exp(-0.02(x-900))}, \end{align*} $$ where $x<600$ for high-pass filter in $y_1$ and $x>600$ for low-pass filter in $y_1$ is zero. enter image description here

However, I prefer the first continuous functions for its simplicity. By any chance, there exists a more elegant solution where three functions are unit decomposition and not piece-wise? Thanks ahead for any suggestions.

$\endgroup$

1 Answer 1

3
+50
$\begingroup$

You were pretty close! Start by defining linear functions that are easy to reason about:

$$\begin{align*} w_1&=\frac{300-x}{50}=0.02(300-x) \\ w_2&=0 \\ w_3&=\frac{x-900}{50}=0.02(x-900) \end{align*}$$

We have chosen $300$ and $900$ as our crossover points, and $50$ as the characteristic $x$-scale over which the differences between the $w_i$ vary by $1$ unit. So for $x<300$, we have $w_1>w_2>w_3$; for $300<x<600$, $w_2>w_1>w_3$; for $600<x<900$, we have $w_2>w_3>w_1$; and for $x>900$, we have $w_3>w_2>w_1$.

Now define $(y_1,y_2,y_3)=\sigma(w_1,w_2,w_3)$ where $\sigma$ is the softmax function. Explicitly:

$$\begin{align*} y_1&=\frac{\exp(w_1)}{\exp(w_1)+\exp(w_2)+\exp(w_3)}=\frac{\exp\big(0.02(300-x)\big)}{\exp\big(0.02(300-x)\big)+1+\exp\big(0.02(x-900)\big)} \\ y_2&=\frac{\exp(w_2)}{\exp(w_1)+\exp(w_2)+\exp(w_3)}=\frac{1}{\exp\big(0.02(300-x)\big)+1+\exp\big(0.02(x-900)\big)} \\ y_3&=\frac{\exp(w_3)}{\exp(w_1)+\exp(w_2)+\exp(w_3)}=\frac{\exp\big(0.02(x-900)\big)}{\exp\big(0.02(300-x)\big)+1+\exp\big(0.02(x-900)\big)} \end{align*}$$

Because the denominators across all three $y_i$ are explicitly identical, and the $\exp$ function is monotonic, the $y_i$ are always in the same order as the $w_i$. We have the same crossover points, and $50$ is the characteristic $x$-scale over which the $y_i$ decay. Moreover, it's clear that $\sum y_i=1$.

If you prefer, you can also "simplify" the expressions for $y_1$ and $y_3$ to a fraction like $1/(1+\exp(a)+\exp(b))$ by dividing out the numerator. However, I think that makes the relationship between the $y_i$ less clear.

$\endgroup$
1
  • $\begingroup$ This is so elegant. Much appreciated for the solution. $\endgroup$
    – LorenMt
    Commented Aug 18, 2017 at 8:01

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .