3
$\begingroup$

Italy is playing the U.S.A. in a football World Cup match. A successful pass is when a player on one team kicks the ball to a player on their team and it is not intercepted by the opposition. Is it possible for Italy to have a higher proportion of its passes be successful than the U.S. in both the first and second halves, and yet for the U.S. to have a higher proportion of its passes be successful over the game as a whole?

It is possible due to the Simpson paradox. One possibility is that Italy successful passing rates in first and second half are $\frac{60}{120}$ and $\frac{60}{60}$ respectively. For US, let its successful passing rate in first and second half be $\frac{10}{50}$ and $\frac{190}{200}.$ Then in first and second half, Italy has a better successful passing rate than US. But in overall, Italy successful passing rate is $\frac{120}{180} = \frac{2}{3}$ whereas US has $\frac{200}{250} = \frac{4}{5}.$

My question is that whether we can find the set of solutions for this kind of problem.

$\endgroup$
2
  • $\begingroup$ What happens if a team doesn't make any pass attempts at all? $\endgroup$
    – JMP
    Commented Dec 12, 2019 at 15:35
  • $\begingroup$ @JMP, I think in that case, the success rate is undefined and therefore it's not a Simpson example. $\endgroup$
    – Hew Wolff
    Commented Dec 12, 2019 at 16:14

2 Answers 2

2
+150
$\begingroup$

Let $a_i,\;\; i=1,2$ be the number of attempts made by Team A in the first and second halves respectively. Let $s_i$ be the number of successful passes, with $b_i, t_i$ the corresponding values for Team B. I have assumed $a_i,b_i\gt0$.

$$\frac{t_1}{b_1}\lt \frac{s_1}{a_1}$$ $$\frac{t_2}{b_2}\lt \frac{s_2}{a_2}$$ $$\frac{t_1+t_2}{b_1+b_2}\gt \frac{s_1+s_2}{a_1+a_2}$$

This effectively implies that the gradient of Team A is steeper than Team B for both halves, but not overall, which appears to depend on the fraction without simplification, which is related to the mediant.

There are more examples at Cut-The-Knot.

This image from the Wikipedia page:

graphic mediant

shows the 'sweet spot'.

Let's say the $a/b, c/d$ are Team B's rate, so both of Team A's lines are steeper, so that their mediant lies in the light green region - anywhere between the extended vectors of $\frac ab, \frac{a+c}{b+d}$, then we have achieved a Simpson paradox. For example see the green line I've added to the image.

graphic mediant 2

As there are $8$ unknowns, each with a different role to play, there are eight different solutions, depending on which unknown you wish to determine.

For example, if we wish to determine $t_1$ and we are given the rest, we know:

$$\frac{(s_1+s_2)(b_1+b_2)}{a_1+a_2}-t_2\lt t_1\lt \frac{s_1 b_1}{a_1}$$

remembering that $t_1\in\mathbb{Z^+}$.

An example is $a_1=100,a_2=150, s_1=50, s_2=100$ for Team A. Then if $b_1=40, b_2=120$, we need $t_1\lt20, t_2\lt80$ and

$$\frac{150\cdot160}{250}-t_2=96-t_2\lt t_1\lt\frac{50\cdot40}{100}=20$$

so $77\lt t_2\lt80$.

The trick is to see the relative sizes of $\frac{s_1+s_2}{a_1+a_2}$ and $\frac{s_1}{a_1}$ as then changing the value of $b_1$ allows us to change the valid range for $t_2$ as we want.

$\endgroup$
1
  • $\begingroup$ I see. This is also called freshman sum. $\endgroup$
    – Idonknow
    Commented Dec 12, 2019 at 23:01
0
$\begingroup$

Let's say:

$n_1$: number of successful passes by Italy in first half

$n_2$: number of attempted passes by Italy in first half

$n_3$: number of successful passes by U.S in first half

$n_4$: number of attempted passes by U.S in first half

$n_5$: number of successful passes by Italy in second half

$n_6$: number of attempted passes by Italy in second half

$n_7$: number of successful passes by U.S in second half

$n_8$: number of attempted passes by U.S in second half

Of course, we must have:

$$n_1 \leq n_2$$

$$n_3 \leq n_4$$

$$n_5 \leq n_6$$

$$n_7 \leq n_8$$

Now, we want Italy to have a higher rate or successful passes in each of the halves, but the U.S. have a better rate over the whole game:

$$\frac{n_1}{n_2} > \frac{n_3}{n_4}$$

$$\frac{n_5}{n_6} > \frac{n_7}{n_8}$$

$$\frac{n_3+n_7}{n_4+n_8} > \frac{n_1+n_5}{n_2+n_6}$$

which means:

$$n_1 \cdot n_4 > n_3 \cdot n_2$$

$$n_5 \cdot n_8 > n_7 \cdot n_6$$

$$(n_3 + n_7) \cdot (n_2+ n_6) > (n_1 +n_5)\cdot (n_4 +n_8)$$

Working out the latter inequality, and then using the first two, this implies:

$$n_3\cdot n_2 + n_3 \cdot n_6 + n_7 \cdot n_2 + n_7 \cdot n_6$$

$$ > n_1\cdot n_4 + n_1 \cdot n_8 + n_5 \cdot n_4 + n_5 \cdot n_8$$

$$ > n_3\cdot n_2 + n_1 \cdot n_8 + n_5 \cdot n_4 + n_7 \cdot n_6$$

And, so we must have:

$$n_3 \cdot n_6 + n_7 \cdot n_2 > n_1 \cdot n_8 + n_5 \cdot n_4$$

However, note that this latter inequality cannot replace the earlier equality; it's merely something that follows from the earlier inequalities. In other words, the solution set is still any $n_1$ through $n_8$ for which we have:

$$n_1 \leq n_2$$

$$n_3 \leq n_4$$

$$n_5 \leq n_6$$

$$n_7 \leq n_8$$

$$n_1 \cdot n_4 > n_3 \cdot n_2$$

$$n_5 \cdot n_8 > n_7 \cdot n_6$$

$$(n_3 + n_7) \cdot (n_2+ n_6) > (n_1 +n_5)\cdot (n_4 +n_8)$$

But we can use as a further 'check' that:

$$n_3 \cdot n_6 + n_7 \cdot n_2 > n_1 \cdot n_8 + n_5 \cdot n_4$$

That is, any values of $n_1$ through $n_8$ that do not obey the latter inequality will not be a solution.

I'd like to note that the inequalities cannot be satisfied if both teams made the same number of attempts in each of the halves. That is, we cannot have that $n_2=n_4=n_I$ and $n_6=n_8=n_{II}$, for then we would need to have:

$$n_1 > n_3$$

$$n_5 > n_7$$

and that would contradict our 'check':

$$n_3 \cdot n_{II} + n_7 \cdot n_I > n_1 \cdot n_{II} + n_5 \cdot n_I$$

However, we can have the same number of attempts for one of the halves. As a simple example: Italy completed $1$ out of $2$ passes in the first half, but the U.S. $0$ out of $2$, and Italy completed $1$ out of $1$ passes in the second half, and the U.S. $7$ out of $8$.

This latter example also shows that Italy can have a $100$% completion rate in one half, and the U.S. a $0$% completion rate in one of its halves, and yet the U.S. still obtaining a better completion rate for the whole game.

Finally, the 'smallest' numbers satisfying the inequalities (where by smallest, I mean minimizing the maximum of $n_1$ through $n_8$) is with Italy completing $1$ out of $3$ passes in the first half, and the U.S. $0$ out of $1$, while Italy completed $1$ out of $1$ passes in the second half, and the U.S. $3$ out of $4$.

$\endgroup$
2
  • $\begingroup$ This seems fishy to me. Your three inequalities turned into one? What if all $n_i$ are equal to 1, except $n_6$, which is huge? $\endgroup$ Commented Dec 12, 2019 at 15:03
  • $\begingroup$ @MeesdeVries Yes, just realized my mistake as well :( I can't just get rid of the earlier equalities; I can only add the last inequality as a kind of 'check': any values that do not obey that inequality will not be a solution. $\endgroup$
    – Bram28
    Commented Dec 12, 2019 at 15:04

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .