101
$\begingroup$

When solving a problem, we often look at some special cases first, then try to work our way up to the general case.

It would be interesting to see some counterexamples to this mental process, i.e. problems that become easier when you formulate them in a more general (or ambitious) form.


Motivation/Example

Recently someone asked for the solution of $a,b,c$ such that $\frac{a}{b+c} = \frac{b}{a+c} = \frac{c}{a+b} (=t).$

Someone suggested writing this down as a system of linear equations in terms of $t$ and solving for $a,b,c$. It turns out that either (i) $a=b=c$ or (ii) $a+b+c=0$.

Solution (i) is obvious from looking at the problem, but (ii) was not apparent to me until I solved the system of equations.

Then I wondered how this would generalize to more variables, and wrote the problem as: $$ \frac{x_i}{\sum x - x_i} = \frac{x_j}{\sum x - x_j} \quad \forall i,j\in1,2,\dots,n $$

Looking at this formulation, both solutions became immediately evident without the need for linear algebra (for (ii), set $\sum x=0$ so that each denominator cancels out with its numerator).

$\endgroup$
7
  • 2
    $\begingroup$ The linked Question : math.stackexchange.com/questions/897118/… $\endgroup$ Commented Aug 16, 2014 at 12:16
  • 7
    $\begingroup$ Well, I personally don't think this is actually simplification by generalization, but rather simplification motivated by generalization. Like if you write $\frac{a}{b+c}$ as $\frac{a}{(a + b + c) - a}$, you'd say that the problem becomes obvious too. $\endgroup$
    – Tunococ
    Commented Aug 16, 2014 at 12:59
  • 1
    $\begingroup$ @Tunococ Fair enough, I'm just saying that writing it like this didn't occur to me until I thought of generalizing it. I understand that this is all a little subjective (hence the "soft-question" tag). $\endgroup$
    – MGA
    Commented Aug 16, 2014 at 13:01
  • 6
    $\begingroup$ This is relevant. $\endgroup$
    – ocg
    Commented Aug 16, 2014 at 13:58
  • 2
    $\begingroup$ Introducing topology makes certain proofs in real analysis clearer and more elegant, though I'm not sure whether they are necessarily easier per se. $\endgroup$ Commented Aug 17, 2014 at 0:24

14 Answers 14

114
$\begingroup$

Consider the following integral $\displaystyle\int_{0}^{1}\dfrac{x^7-1}{\ln x}\,dx$. All of our attempts at finding an anti-derivative fail because the antiderivative isn't expressable in terms of elementary functions.

Now consider the more general integral $f(y) = \displaystyle\int_{0}^{1}\dfrac{x^y-1}{\ln x}\,dx$.

We can differentiate with respect to $y$ and evaluate the resulting integral as follows:

$f'(y) = \displaystyle\int_{0}^{1}\dfrac{d}{dy}\left[\dfrac{x^y-1}{\ln x}\right]\,dx = \int_{0}^{1}x^y\,dx = \left[\dfrac{x^{y+1}}{y+1}\right]_{0}^{1} = \dfrac{1}{y+1}$.

Since $f'(y) = \dfrac{1}{y+1}$, we have $f(y) = \ln(y+1)+C$ for some constant $C$.

Trivially, $f(0) = \displaystyle\int_{0}^{1}\dfrac{x^0-1}{\ln x}\,dx = \int_{0}^{1}0\,dx = 0$. Hence $C = 0$, and thus, $f(y) = \ln(y+1)$.

Therefore, our original integral is $\displaystyle\int_{0}^{1}\dfrac{x^7-1}{\ln x}\,dx = f(7) = \ln 8$.

This technique of generalizing an integral by introducing a parameter and differentiating w.r.t. that parameter is known as Feynman Integration.

$\endgroup$
8
  • 10
    $\begingroup$ Great example! Really neat $\endgroup$
    – seldon
    Commented Aug 16, 2014 at 19:59
  • 2
    $\begingroup$ My favorite illustration so far. Of course the others are nice too. $\endgroup$
    – MGA
    Commented Aug 16, 2014 at 20:22
  • $\begingroup$ The special form (not only the general form) can also easily be determined by setting $\ln x=-t$ and then use Frullani's integral. $\endgroup$
    – Tunk-Fey
    Commented Aug 17, 2014 at 14:44
  • $\begingroup$ The integrand does have an anti-derivative in terms of the upper incomplete gamma function: $\Gamma(0, -\ln x) - \Gamma(0, -8\,\ln x)$. That still doesn't help because it's undefined at both limits of integration. $\endgroup$ Commented Aug 17, 2014 at 21:52
  • $\begingroup$ @Tunk-Fey You're right, apologies. Deleting my comment. $\endgroup$
    – MGA
    Commented Aug 18, 2014 at 11:39
32
$\begingroup$

I recall something like this coming up when evaluating certain summations. For example, consider:

$$ \sum_{n=0}^{\infty} {n \over 2^n} $$

We can generalize this by letting $f(x) = \sum_{n=0}^{\infty} nx^n$, so:

$$ \begin{align} {f(x) \over x} &= \sum_{n=0}^{\infty} nx^{n-1} \\ &= {d \over dx} \sum_{n=0}^{\infty} x^n \\ &= {d \over dx} {1 \over {1-x}} = {1 \over (x-1)^2} \end{align} $$

Therefore,

$$ f(x) = {x \over (x-1)^2} $$

The solution to the original problem is $f({1 \over 2}) = 2$.

$\endgroup$
1
  • 8
    $\begingroup$ Interestingly, this also shows $\sum_{n=0}^\infty \frac{1}{2^n} = \sum_{n=0}^\infty \frac{n}{2^n}$, which succeeded at surprising me. :) $\endgroup$
    – Keba
    Commented Aug 3, 2015 at 23:26
26
$\begingroup$

George Polya's book How to Solve It calls this phenomenon "The Inventor's Paradox": "The more ambitious plan may have more chances of success." The book gives several examples, including the following.

1) Consider the problem: "A straight line and a regular octahedron are given in position. Find a plane that passes through the given line and bisects the volume of the given octahedron." If we generalize this to "a straight line and a solid with a center of symmetry are given in position..." it becomes very easy. (The plane goes through the center of symmetry and the line.)

The book also gives other examples of the Inventor's Paradox, but "more ambitious" is not always the same as "more general." Consider: "Prove that $1^3 + 2^3 + 3^3 + ... + n^3$ is a perfect square." Polya shows that it is easier to prove (by mathematical induction) that "$1^3 + 2^3 + 3^3 + ... + n^3 = (1 + 2 + 3 + ...+ n)^2$". This is more ambitious but is not more general.

ADDED LATER:

The web page Generalizations in Mathematics gives many similar examples. It even gets into the difference between "more ambitious" and "more general."

$\endgroup$
3
  • $\begingroup$ Good point regarding generality/ambitiousness, and both are interesting. I have made a minor edit to the question to reflect this. $\endgroup$
    – MGA
    Commented Aug 16, 2014 at 12:51
  • 10
    $\begingroup$ In the first the easy approach is "more general" in the sense that it picks out the significant property of an entity and ignores any other particular properties it may have. There are loads of examples like it, for example any time you're asked to prove some property of a particular group that's true of all Abelian groups, or some property of a particular function that's true of any continuous monotonic function, and so on. The problem is, "identify the useful property of this object", so the more general problem where you only have the useful property is always easier ;-) $\endgroup$ Commented Aug 16, 2014 at 19:26
  • $\begingroup$ The cubes and square example, termed 'more ambitious', is an example of the technique of solving a problem by the addition of an invented constraining hypothesis, and thus is actually a case of particularization. $\endgroup$
    – Jose Brox
    Commented Aug 20, 2014 at 8:00
16
$\begingroup$

The solution to the Monty Hall problem

Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what's behind the doors, follows the fixed protocol of opening another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice?

becomes more obvious when you generalize it to an $N$-door problem with the host opening $N-2$ doors. For $N\gg3$ most people's intuition revolts against staying with the original choice.

$\endgroup$
9
  • 3
    $\begingroup$ A more helpful approach to the standard Monty Hall is to examine what is wrong with the above formulation. The story, as told here, is consistent with a game show where the host offers a chance to switch only to contestants who have already chosen the correct door. When playing that game, you should never switch regardless of how many doors there are. In the standard problem, it is explicitly stated that the host will always offer the switch. That's what makes the odds $N-1$ to $1$ in favor of the switch; and $2$-to-$1$ odds is good enough. $\endgroup$
    – David K
    Commented Aug 16, 2014 at 15:22
  • $\begingroup$ @DavidK - the text I included is Wikipedia's description of the Monty Hall problem. Have modified it. $\endgroup$
    – Johannes
    Commented Aug 16, 2014 at 20:20
  • 2
    $\begingroup$ The problem has been underspecified in some widely-read sources; no surprise one of those got copied to Wikipedia. I'll accept this version as implying the player knows the fixed protocol. In that case, if someone's intuition requires $N$ to be increased above $3$ before they think the switch is beneficial, they do not understand the reason why it is beneficial. They have merely been nudged from an incorrect guess to an unjustified but luckily correct guess. $\endgroup$
    – David K
    Commented Aug 18, 2014 at 23:12
  • $\begingroup$ @DavidK: I think that would be a fair criticism of someone who claims to be game-theoretically rational, but I doubt such a person exists; would a reasonable person expect a Bayes factor of $2:1$ to come to his rescue? Even though it wouldn't be necessary to the idealised mathematical observer with the eyes of a hawk, in many contexts exaggerating the difference helps me notice there is one, whence I can see via a monotonicity argument that the effect is non-zero (if weak) for $N \gtrsim 3$. $\endgroup$ Commented Feb 16, 2015 at 19:27
  • 1
    $\begingroup$ Heck, what I thought negligible was a raw probability of 1/6, which I ought to appreciate even then. $\endgroup$ Commented Feb 16, 2015 at 21:25
8
$\begingroup$

I'm not entirely convinced that problems made somehow easier by generalizations is exactly what is going on here.

In the example provided in your question, what made the solution to the general problem appear easier is that it dawned on you that $$x_1+\cdots+x_{j-1}+x_{j+1}+\cdots+x_n=\sum_{i=1}^nx_i-x_j.$$ Indeed, (as tunococ commented) had the less general problem been written as $$\frac{a}{a+b+c-a}=\frac{b}{a+b+c-b}=\frac{c}{a+b+c-c},$$ then your easier solution to the general problem applies here as well. I would argue that, if anything, the generalization helped you notice a pattern you had not before seen. Would you still have noticed this pattern had you not formulated the problem in a general way? Perhaps, perhaps not.

In my opinion, what your experience shows is that formulating a problem $Q$ in more general terms $P$ is one of many ways by which one can gain a fundamental insight that provides the key to the solution of the general problem $P$ (and thus inevitably also solves the initial special case $Q$ also). Sometimes, this can lead to a solution that was as of yet unknown to you and that will be more elegant or easier than the previous solutions. However, given that such an insight could easily have come without generalizing the problem, the fact that the solution did come from you thinking about the generalization seems highly circumstantial to me.

EDIT: JimmyK4542's example (and Feynmann's integration trick) seems like a spectacular demonstration of the phenomenon, however.

$\endgroup$
2
  • 1
    $\begingroup$ I merely meant that as an example to get the discussion started, and you're of course right that in this case one doesn't have to generalize to see the solution. But personally, I didn't think of writing $+a-a$ etc. because I didn't feel the need to. Once I considered the general case, I was compelled to write it like this. Anyway, I think that better examples have now been given on here that really illustrate the point - I'm particularly fond of the Feynman integration trick. $\endgroup$
    – MGA
    Commented Aug 16, 2014 at 20:16
  • 2
    $\begingroup$ JimmyK4542's example (and Feynmann's integration trick in general) indeed contradicts my claim that a solution to a general problem always applies to a special case. I'll edit this out. $\endgroup$
    – user78270
    Commented Aug 16, 2014 at 23:53
8
$\begingroup$

On this site one frequently finds under the linear-algebra tag questions of the kind: what is the determinant of a matrix $$ \begin{pmatrix}a&b&b&b\\b&a&b&b\\b&b&a&b\\b&b&b&a\end{pmatrix}? $$ (I've just posted this question, which contains a list of such questions). It turns out finding an answer to this question becomes almost trivial (see my answer to the linked question) when reformulated more generally as

What is the characteristic polynomial of a square matrix$~A$ of rank$~1$?

knowing that by specialisation the answer gives the determinant of $\lambda I-A$ for any scalar$~\lambda$.

$\endgroup$
6
$\begingroup$

A broadly successful application of this was introduced by Richard Bellman under the phrase dynamic programming. The story of the "birth" of this now foundational topic in applied math is told largely in Bellman's own words here.

A related term gives more evidence of the connection of ideas: invariant imbedding.

A good discussion of dynamic programming references and examples came up early at StackOverflow, but was subsequently closed as off-topic.

An illustration is finding a shortest path between two specified points by "imbedding" that problem in finding all shortest paths from one point, Dijkstra's algorithm.

$\endgroup$
5
$\begingroup$

From time to time famous problems have such feature. History suggests this point: the transcendentalness of $\pi$ solves the long-lasting problem of squaring a circle, analytic geometry and irrationality theory solve the problem of doubling a cubic, Galois's invention of group theory and quintic function, Kummer's invention of ideals and Fermat's last theorem, global differential geometry and Chern's intrinsic proof of Gauss-Bonnet theorem, and so on.

$\endgroup$
5
$\begingroup$

The most spectacular example I have seen is this one:

Suppose A is an $n\times n$ matrix with eigenvalues $\lambda_1$, ..., $\lambda_n$, including each eigenvalue according to its multiplicity. Then $A^2$ has eigenvalues $\lambda_1^2$, ..., $\lambda_n^2$ including multiplicity.

To prove this is in fact very very hard. (It's easy to show that $\lambda_1^2$, ..., $\lambda_n^2$ are all eigenvalues of $A$ by considering their eigenvectors, but unless you the dimensions of the eigenspaces match the multiplicities you're stuck.)

However, the proof of the following statement is actually perfectly possible using elementary arguments (albeit clever arguments):

Suppose A is an $n\times n$ matrix with eigenvalues $\lambda_1$, ..., $\lambda_n$, including each eigenvalue according to its multiplicity. Then for any polynomial $g(x)$, $g(A)$ has eigenvalues $g(\lambda_1)$, ..., $g(\lambda_n)$ including multiplicity.

$\endgroup$
1
  • $\begingroup$ Does this property have a name? $\endgroup$ Commented Nov 4, 2023 at 6:53
4
$\begingroup$

Generalization comes up a lot when doing induction. For example,

$$\forall n ~~ \sum_{k=0}^n 2^{-k} \le 2$$

is difficult to prove directly using induction on $n$. However, if you generalize to a stronger statement:

$$\forall n ~~ \sum_{k=0}^n 2^{-k} \le 2 - 2^{-n}$$

Then induction may be used directly:

$$\sum_{k=0}^{n+1} 2^{-k} \le 2 - 2^{-n - 1}$$ $$\sum_{k=0}^n 2^{-k} + 2^{-n-1} \le 2 - 2^{-n - 1}$$ $$\sum_{k=0}^n 2^{-k}\le 2 - 2^{-n}$$

Obviously you could see that it is a geometric series, but that is a generalization also.

problems that become easier when you formulate them in a more general (or ambitious) form

The potential difficulty of a generalization isn't the only disadvantage. If you disprove a generalization, then you haven't disproven the original theorem. In that respect, a generalization effectively forces you to pick sides in the investigation of a theorem.

$\endgroup$
1
$\begingroup$

A nice example appeared on this web site today: Every prime number $p\ge 5$ has $24\mid p^2-1$ .

As posed, the problem sounds like it might be difficult. But it is very easy to show the more general result that every $n$ of the form $6k\pm 1$ has the required property.

$\endgroup$
1
$\begingroup$

One of my favorite examples of this phenomenon is in the study of the outer measure of closed intervals. Let us begin with the following definitions:

  1. a finite open interval cover of a closed interval $I$ is a finite collection of open intervals $\{(a_i,b_i)\}_{i=1}^n$ whose union contains $I$.
  2. the total length of a finite open interval cover is defined as $\sum_{i=1}^n (b_i-a_i)$.

In one of my classes that I TA'd, I tried to prove the following theorem (as an example application of mathematical induction, beyond the "silly" arithmetical identities/combinatorial trivia that typically accompany introductions to induction --- to date it is my favorite example of induction to use in a real analysis class):

Theorem A: the total length of any finite open inteval cover of $[0,1]$ is $>1$.

I had chosen $[0,1]$ for the sake of concreteness, but during the course of giving the proof, I realized that I actually could only pull through with the induction if I was proving the general statement:

Theorem B: the total length of any finite open interval cover of any bounded closed interval $[a,b]$ is $>(b-a)$.

The proof is as follows:

  • Base case: if $n=1$, then we have $(a_1,b_1)\supseteq [a,b]$, and so $a_1<a<b<b_1 \implies b_1-a_1>b-a$.

  • Inductive step: assuming the theorem holds for finite open interval covers of size $n=k$, let us prove it for $n=k+1$. Well, if we have $\mathscr C:= \{(a_i,b_i)\}_{i=1}^{k+1}$ covering $[a,b]$, then some $(a_i,b_i)$ covers the point $b$. By reindexing, we may assume WLOG that that $i=k+1$. Then, $[a,a_{k+1}]$ is not covered by $(a_{k+1},b_{k+1})$, but it is covered by $\mathscr C$, so it must be covered by $\{(a_i,b_i)\}_{i=1}^k$. So now we have a finite open interval cover of size $k$ of the closed interval $[a,a_{k+1}]$, so we can apply the induction hypothesis, which tells us that $\sum_{i=1}^k (b_i-a_i)> (a_{k+1}-a)$. Adding $(b_{k+1}-a_{k+1})$ to both sides and using $b_{k+1}>b$ gets us the result.

$\endgroup$
1
$\begingroup$

I'll give 2 examples of problems that become "conceptually easier/more intuitive" once we generalize (but not technically easier; in fact technically much more difficult due to needing approximation arguments). This is not surprising at all, since the whole point of generalization/abstraction in mathematics is to come up with more "conceptual"/"big picture" understanding, but I think the examples are worth sharing anyways.

  1. Buffon's needle/noodle. The proof sketched in the Wikipedia page is

The probability distribution of the number of crossings depends on the shape of the noodle, but the expected number of crossings does not; it depends only on the length $L$ of the noodle and the distance $D$ between the parallel lines (observe that a curved noodle may cross a single line multiple times). [E.g. if we think of $D$ as fixed, this expectation is some (continuous) function $f(L)$]

This fact may be proved as follows (see Klain and Rota). First suppose the noodle is piecewise linear, i.e. consists of $n$ straight pieces. Let $X_i$ be the number of times the $i$th piece crosses one of the parallel lines. These random variables are not independent, but the expectations are still additive due to the linearity of expectation: $$ \mathbb E ( X_1 + ⋯ + X_n ) = \mathbb E ( X_ 1 ) + ⋯ + \mathbb E ( X _n ).$$ [Using this, cutting a linear noodle/needle into $n$ pieces we have that $f(\sum_{i\in I}L_i)=\sum_{i \in I} f(L_i)$, so indeed $f$ is linear in $L$.] Regarding a curved noodle as the limit of a sequence of piecewise linear noodles, we conclude that the expected number of crossings per toss is proportional to the length; it is some constant times the length $L$. Then the problem is to find the constant. In case the noodle is a circle of diameter equal to the distance $D$ between the parallel lines, then $L = \pi D$ and the number of crossings is exactly $2$, with probability $1$. So when $L = \pi D$ then the expected number of crossings is $2$. Therefore, the expected number of crossings must be $2L/(\pi D)$.

The Wiki page also has the proof for just Buffon's needle, which is just an exercise in undergraduate level probability and calculus. In my opinion, this Buffon noodle approach is a very intuitive explanation for the formula (linear in $L$, with constant factor involving $2$, $\pi$, and $D$ for very intuitive reasons), while the calculus obscures the intuition. However, the Buffon noodle problem is techinically much harder, since we have to do a limiting argument for approximating continuous functions (noodles) by piecewise linear ones.

  1. 3blue1brown made a video on the average area of a random projection of a 3D shape onto 2D: https://www.youtube.com/watch?v=ltLUadnCyi0&ab_channel=3Blue1Brown. Alice's approach is really similar to the above Buffon noodle conceptual approach:
    • use linearity of expectation to break down any polyhedron into the sum of the average projected area for each face;
    • observe that the average projected area of each face is some constant times the area of the face;
    • hence the average projected area for the entire (convex) polyhedron is some constant times the surface area;
    • and compute the constant by studying the special case of a sphere, whose average projected area is super easy to compute. Again the difficulty comes from going from polyhedron to general shapes (e.g. spheres) via some sort of limiting argument.
$\endgroup$
1
  • $\begingroup$ Buffon's Needle is Epic!! I wrote my IB Math IA on it! $\endgroup$ Commented Apr 24, 2023 at 19:24
0
$\begingroup$

Around the year 628, Brahmagupta found a nontrivial integer solution to the equation $92x^2+1=y^2$, which is now known as Pell's equation or a Brahmagupta-Pell equation [1]. It was solved by considering the more general problem of finding solutions to $92x^2+z=y^2$, and using Brahmagupta's identity to produce a new solution from two given ones:

Given $92x_1^2+z_1=y_1^2$ and $92x_2^2+z_2=y_2^2$, by Brahmagupta's identity, $92(x_1y_2+x_2y_1)^2+z_1z_2=(92x_1x_2+y_1y_2)^2$ is a new solution.

Brahmagupta starts from the easy-to-find solution $92\cdot 1^2+8=10^2$ of the $z=8$ case and uses this method of generating solutions twice to obtain a solution with $z=1$ and $x$, $y$, $z$ integers.

Tackling $92x^2+1=y^2$ directly was done by Lagrange using the continued fraction of $\sqrt{92}$, he also arrived at the same solution $(x,y)=(120,1151)$. (citation needed) Lagrange also seems to have considered $Nx^2+z=y^2$ for general $z$. [2]

[1]: A. K. Dutta, "The Bhāvanā in Mathematics", in Bhāvanā (2017), pp.13--19.

[2]: "Œuvres de Lagrange" vol. 1, pp.675--676, edited by Joseph Alfred Serret

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .