1
$\begingroup$

I am working on some theory related to controls in the context of stochastic games, and I am a bit confused on some terminologies for zero-sum games.

Suppose we have a zero-sum game with two players. Let the first players' inputs be $U \in \mathbb{R}^{m}$, whose goal is to minimize the payoff function $J(U,V)$. Let the second players' inputs be $V \in \mathbb{R}^{\ell}$, whose goal is to maximize the payoff function $J(U,V)$. In my context, the first player is called the "controller" and the second player is called the "stopper".

I am currently reading a paper that states the following:

The upper game is a scheme in which the stopper chooses $V$ based on the information it has on the control $U$, and the upper value is defined as

$$ \mathcal{V}^+ = \inf_{U\in\mathbb{R}^{m}}\sup_{V\in\mathbb{R}^{\ell}} J(U,V). $$

Similarly, the lower game is a scheme in which the controller chooses $U$ based on the information it has on the control $V$, and the lower value is defined by

$$ \mathcal{V}^{-} = \sup_{V\in\mathbb{R}^{\ell}}\inf_{U\in\mathbb{R}^{m}} J(U,V). $$

Now I know from the minimax theorem that $\mathcal{V}^{-} \leq \mathcal{V}^{+}$ and they are only equal to each other at a saddle point (equilibrium).

My question is regarding the definitions of the upper and lower games. Looking at the equation for the upper value, how does it make sense that the upper game is a game in which the stopper plays based on the information it has on the controller if the first optimization is over $V$ and the second optimization is over $U$. The way I'm interpreting the upper value is that the stopper optimizes the cost, and then based on the optimal stopper input, the controller then optimizes the resulting payoff, so the stopper acts first, and the controller best follows after.

Can someone please explain this to me?

$\endgroup$

1 Answer 1

1
$\begingroup$

You have the order of the player's choices the wrong way round. In the upper game, the stopper is allowed to make his or her choice of $\ V\ $ depend on the controller's choice of $\ U\ $. In practice, he or she would only be able to do this if the controller had already chosen $\ U\ $. That is, it is the controller, not the stopper, who makes the first choice.

If the controller chooses $\ U\ $, then the stopper should choose $\ V^*(U)\ $ to make the payoff $\ J\big(U,V^*(U)\big)\ $ as close as possible to $\ \sup_\limits{V\in\mathbb{R}^\ell}J(U,V)\ $. When the controller chooses $\ U\ $, however, he or she can reasonably suppose that the the stopper, knowing the value of $\ U\ $, will choose $\ V^*(U)\ $, so the best value for him or her to choose would be one which makes $\ \sup_\limits{V\in\mathbb{R}^\ell}J(U,V)\ $ as close as possible to $\ \inf_\limits{U\in\mathbb{R}^m}\sup_\limits{V\in\mathbb{R}^\ell}J(U,V)\ $.

$\endgroup$
2
  • $\begingroup$ Ok I think I get it. Let me summarize here so you validate my reasoning: In the upper game, assume some $U$, then stopper chooses $V$ to maximize payoff, then given $V^*(U)$ choose $U^*(V^*)$ to minimize payoff. For the lower game, assume some $V$, then controller chooses $U$ to minimize payoff, then given $U^*(V)$ choose $V^*(U^*)$ to maximize payoff. Correct? And in equilibrium these two should equal! $\endgroup$ Commented Nov 22, 2020 at 5:17
  • $\begingroup$ Yes, you've got it. But in general, when each player has to make his or her choice without knowing that made by the other, a saddle point ("equilibrium") in pure strategies (i.e elements of $\ U\ $ and $\ V\ $) will not usually exist. In these circumstances the inequality $\ \cal{V}^-\le \cal{V}^+\ $ will usually (but not always) be strict. This can sometimes be remedied by allowing the players to choose their strategies randomly in secret ("mixed strategies") and replacing the payoff function by its expected value. $\endgroup$ Commented Nov 22, 2020 at 6:27

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .