42
$\begingroup$

A function $f : \mathbb{R} \to \mathbb{R}$ is convex (or "concave up") provided that for all $x,y \in \mathbb{R}$ and $t \in [0,1]$, $$f(tx + (1-t)y) \le tf(x) + (1-t)f(y).$$ Equivalently, a line segment between two points on the graph lies above the graph, the region above the graph is convex, etc. I want to know why the word "convex" goes with the inequality in this direction, and how I can remember it. Every reason I have heard makes just as much sense applied to the opposite inequality ("concave down").

$\endgroup$
3
  • $\begingroup$ You are compromising the utility of the notion by failing to relativize it to subintervals of the domain. About 80% of the time what you are interested in in Calc 1 regarding concavity is where the the sign of the concavity (equivalently, the sign of the curvature) changes, ie, in any INFLECTION POINTS. $\endgroup$
    – Mike Jones
    Commented Sep 29, 2011 at 21:25
  • 3
    $\begingroup$ The epigraph of a convex $\mathbb{R} \to \mathbb{R}$ function is a convex set within ``graph space'' $(x,y)$ although this doesn't explain why we should look at the epigraph instead of the subgraph…. $\endgroup$ Commented Jun 19, 2014 at 14:13
  • 1
    $\begingroup$ I think sometimes as mathematicians we forget that some words we use do have normal "every-day" meanings. The definition of convex, in the every-day sense, means that a surface bulges out TOWARDS you as you look at it. And since the canonical orientations of the way we draw graphs have us think we are standing BELOW the graph, a convex function looks... well... convex. $\endgroup$
    – user123641
    Commented Mar 10, 2017 at 17:27

13 Answers 13

40
$\begingroup$

Not sure why convex is defined that way, but one way to remember is that the derivative is monotonically increasing for some convex functions.

Or maybe just remember that $e^x$ is conv$e^x$. (I just thought of this one!)

$\endgroup$
1
  • 2
    $\begingroup$ 'conv is such an operator I've never seen before. Ah... wait a minute.' $\endgroup$
    – Hermis14
    Commented Dec 23, 2021 at 20:27
15
$\begingroup$

Lets say that you accept the definition of a convex set in higher dimensions, like a sphere in $\mathbb{R}^3$. The question I seek to provide insight into is why convex functions in one variable are defined as opening up instead of down, since this seems like an arbitrary definition. This is because, depending on how you look at the graph, you could naively view the function as bending outwards (like a convex set) or inwards (concave). However, there is a nice connection between these two things using metric spaces that I think can provide some meaning to the way it is defined.

Most of the metrics that you are familiar with have open balls that are convex, such as the standard metric. But some are actually non convex. A good example of this is $ d(x,y) = \sum \sqrt{|x_i - y_i |} $. (note that $\sqrt{x}$ is not a convex function)

Here is an interesting condition:

Given a metric $d$. If for all $y,z\in E$ and $0\leq t\leq 1$,

$d \left(x, \ t y \; \, + \; (1-t) z \right) \quad \leq \quad t d(x,y) \; + \; (1-t) d(x,z) $

then the open balls formed by $d$ are convex. [1] In other words, if you fix $x$ and $d(y):\mathbb{R}^n\rightarrow \mathbb{R}$ is a convex function, then the open balls are convex sets.

Usually $d(x,y) = \sum f \, (x_i,y_i)$, for some $f:\mathbb{R}^2\rightarrow\mathbb{R}$. If we fix $x$ and $f:\mathbb{R}\rightarrow \mathbb{R}$ fits the definition of a convex function, then $d$ will also be convex, and the condition will be satisfied, giving us convex balls.

So convex functions (if they can form a metric) will give you convex open balls. A nice connection that makes the definition make more sense. Other conditions that guarantee convex open balls are discussed in the paper I reference.

[1] Norfolk, T. (1991). When does a metric generate convex balls? www.math.uakron.edu/~norfolk/convex.ps

$\endgroup$
8
  • 1
    $\begingroup$ "I think you would agree that a sphere or any other convex shape fits your intuitive idea of what "convex"" Is this not circular? My intuition about what 'convex' is supposed to mean may be shaped by having seen the definition of convexity and by associating the definition to several known shapes. $\endgroup$
    – user116
    Commented Sep 1, 2010 at 16:50
  • 2
    $\begingroup$ @Srikant: "Convex" does have a standard meaning in English. To strengthen his point, GottfriedLeibniz might also appeal to the widespread consistent use of "convex" in many branches of mathematics. In effect, this views the question as probing for connections between an idea in one area of mathematics (single variable calculus) and possibly related ideas. $\endgroup$
    – whuber
    Commented Sep 1, 2010 at 17:00
  • $\begingroup$ @Srikant yes, in essence its a circular statement. It wasn't really how I meant to word it. I will edit it. My point is, like @whuber mentions, that the definition of convex functions in one variable are defined as opening upwards instead of downwards seems arbitrary. My answer is meant to show that the definition in one variable is consistent (in a way) with the definition of a convex shape in higher dimensions. Intuitive is probably the wrong word. In fact, I tend to loath using the word "intuitive" because its meaning is so damn ambiguous. $\endgroup$ Commented Sep 1, 2010 at 18:32
  • $\begingroup$ I should add that any function $f:\mathbb{R}^n\rightarrow\mathbb{R}$ is defined as convex if its epigraph is a convex set. I dont think that has been mentioned here. $\endgroup$ Commented Sep 1, 2010 at 18:52
  • 1
    $\begingroup$ @GottfriedLeibniz: I'm aware of that, but why the epigraph and not the hypograph? $\endgroup$ Commented Sep 1, 2010 at 21:30
13
$\begingroup$

One of my professors told me the following memorable line: "A concave function looks like the roof of a cave." which helps me remember what is a concave and what is a convex function.

$\endgroup$
6
  • 1
    $\begingroup$ Yes, it's a great mnemonic, but does it really answer the question, which asks why? $\endgroup$
    – whuber
    Commented Sep 1, 2010 at 15:43
  • 4
    $\begingroup$ @whuber It does in part because the OP also asked 'how I can remember it'. $\endgroup$
    – user116
    Commented Sep 1, 2010 at 16:26
  • 1
    $\begingroup$ @Srikant: agreed, but that seems like the less significant part of the question to me. Mnemonics are extremely useful, whence their popularity, but in general they do nothing to help one's understanding. Answers like that of GottfriedLeibniz are much deeper and satisfying, even if (in some opinions) they might turn out to be incomplete or even wrong. $\endgroup$
    – whuber
    Commented Sep 1, 2010 at 16:58
  • 3
    $\begingroup$ @whuber I am not disputing your point about the 'why'. Whether it really answers the question is upto the OP to decide. I simply answered that part of the question where I thought I could contribute something. $\endgroup$
    – user116
    Commented Sep 1, 2010 at 18:32
  • 1
    $\begingroup$ Since this question has gone CW I unaccepted the answer. I like it as a mnemonic, but I still haven't seen a really satisfactory answer for "why". Answers posted so far seem just as applicable if you reverse them. $\endgroup$ Commented Sep 1, 2010 at 21:35
9
$\begingroup$

The primary concept is convexity, not concavity. It applies to geometric figures, originally lenses, and this usage was adapted to functions. There is no comparable concept of concavity for, say, 2-dimensional regions, except as the absence of the property of convexity. There is also no property for figures in general corresponding to the anti-convexity inequality, because most non-convex figures will be locally convex. It is a matter of historical convention that a function is called "convex" if the region above the graph of the function is convex, and it would have caused no mathematical problem to use the opposite convention based on the region below, but concavity is a more limited concept that is defined in terms of convexity (or only defined for functions) and not the other way around.

The terms "concave up" and "concave down" appear mainly in non-specialist US college textbooks on calculus. They are nonstandard terminology and, I think, bad practice that should be discouraged (with luck and sufficient ruthlessness maybe they can be squelched in a generation...). As far as I know the etymology went as follows:

  1. Like "convex", the word "concave" has a prior use in optics. Concave (inward-curved) lenses are the opposite of convex lenses, so there is a pre-existing word for "not convex" or "convex in the opposite direction".

  2. Convex has an absolutely entrenched mathematical use to denote convex figures as well as functions (and sequences) with increasing derivative.

  3. Functions whose negative is convex occur frequently and "concave [function]" came into use as a convenient description of this situation. The linguistic logic was clear enough to make this immediately understandable. It's not clear whether it was more or less favored compared to statements involving the negative, such as saying that $-f(u)$ is convex, or $f$ is anti-convex, or that is it the negative of a convex function. I don't have data at hand from web searches or anything like that, but I think concavity is less common as a description of negatively convex sequences. For functions the ability to draw a graph makes the resemblance to lenses clearer so that both words seem sensible. (added: concavity as a counterpart to convexity for functions and sequences also gained momentum as its own term once log-convexity and log-convex became standard usage. Because the relationship between log-convex and log-concave functions is not simply change of sign but a multiplicative inverse, using only the words based on convexity might lead to confusion or circumlocution.)

  4. Authors of US college calculus textbooks, writing for an audience not familiar with or necessarily interested in convex figures and optics, and aware of potential for confusion (e.g., the graph of a concave function still bounds a convex-shaped region, or the subsequent use of convex to describe functions of several variables and the regions on which those are defined) cooked up a terminology based on "concavity" as a stand-alone concept, limited to the one-variable context where $f(x)$ is graphed with the $y$-axis direction being upward. It's not clear how consistent this concave-up and concave-down terminology is between books and whether it agrees with the earlier, non-confusing use of concave to denote negative convexity.

$\endgroup$
9
  • 2
    $\begingroup$ What's so bad about "concave up" and "concave down" in Calc 1? I've never had a student mix up which is which. That's surely more important than purity. $\endgroup$ Commented Dec 13, 2010 at 1:05
  • 4
    $\begingroup$ "with luck and sufficient ruthlessness maybe they can be squelched in a generation..." - I gave a +1 solely because of this statement... :D $\endgroup$ Commented Dec 13, 2010 at 5:18
  • 4
    $\begingroup$ @Lao: one bad thing is that a convex function is "concave down" (according to wikipedia's article en.wikipedia.org/wiki/Convex_function), while also being the opposite of a concave function. Adding new terminology inconsistent with older, more general, useful, established and immovably entrenched terminology is a step backward. There is nothing you can express with "concave up/down" that cannot be expressed as easily with "convex", and latter term has the advantage of carrying additional associations that can be used to reinforce the meaning. $\endgroup$
    – T..
    Commented Dec 13, 2010 at 5:38
  • 1
    $\begingroup$ When transitioning from mechanical calculus to analysis, I found the concave up/concave meme that I had learned completely unhelpful. I believe that students should be taught correct, accurate terminology from day one; otherwise, it's just confusing when they advance to higher levels. $\endgroup$ Commented Jun 9, 2011 at 18:25
  • 2
    $\begingroup$ @T..: It looks like you misread the Wikipedia article. The relevant portion is: " a real-valued function f(x) defined on an interval is called convex (or convex downward or concave upward" $\endgroup$
    – Mike Jones
    Commented Sep 29, 2011 at 19:49
8
$\begingroup$

With the caveat that it's usually more helpful to devise your own mnemonics than follow someone else's

  • here are a couple of mine, poorly drawn (the second is same as Srikant Vadali's answer):

Convex function Concave function

  • convex: smiley face

  • Another way of remembering them, if you recall the meanings of convex and concave outside mathematics (as in lenses, etc.), is that you look from below: if the graph of the function viewed from below looks convex (i.e., bulging towards you) the function is convex, if it looks concave the function is concave.

  • Yet another way is to keep in mind the definition: "a function is convex if its epigraph is a convex set". The epigraph is the set of points lying above the graph, and a convex set is one in which every line segment between two points in the set lies within the set. [Actually, for me, this definition is more useful for remembering what epigraph means :-)]

$\endgroup$
5
  • 2
    $\begingroup$ "Epi" means "above." $\endgroup$
    – whuber
    Commented Sep 1, 2010 at 17:01
  • 1
    $\begingroup$ So why shouldn't a convex function be one whose hypograph is convex? $\endgroup$ Commented Sep 1, 2010 at 21:31
  • $\begingroup$ @Nate: I was answering the “how I can remember it” part of your question, not “why the word "convex" goes with the inequality in this direction”. Of the five mnemonics I gave in the answer for remembering what "convex" means, the last is clearly the poorest, since it essentially requires you to memorize a definition. So what? :-) $\endgroup$ Commented Sep 3, 2010 at 22:09
  • 1
    $\begingroup$ A problem with using the 'v' in convex is that concave also has a 'v'. It thus reduces to con$\textbf{cave}$ and the other one. $\endgroup$ Commented Oct 5, 2013 at 5:39
  • $\begingroup$ For some reason, the epigraph being convex is the one that made me remember this forever. Maybe there is some hidden psychological reason? $\endgroup$
    – Aloizio Macedo
    Commented Dec 19, 2015 at 3:55
3
$\begingroup$

It is always good to go back to the source. Modern treatment of convex functions can be traced back to a paper by J.L.V.W. Jessen, “Om konvekse Funktioner og Uligheder mellem Middelværdier,” Nyt Tidsskrift for Mathematik 01/1905; 16 B. It can be found in Google books. The definition simply says:

$\phi(x) + \phi(y) \geqslant 2 \phi(\frac{x+y}{2})$

A convex function does not require differentiability, nor continuity. It can be defined in any metric space with geodesics, where a "middle point" is well defined. The essential idea of a convex function is therefore its property of bulging out towards the "outside." (Average of the function is larger than the function of the average.) If we have to use a more graphical name, nowadays we would probably call it a centrifugal function instead, since it bulges out from the center of any interval.

Now, the shape of a convex function is clearly counter-intuitive. If we view the bottom of its curve as the base, as the road sign image from Wikipedia shows: road bump, then clearly a convex surface would correspond to what we would call a "concave" function, unless we assume the opposite and take the top of the curve as the base, which makes it all the less intuitive. The same awkwardness is observed in an ideographic language such as Chinese, where the character for convexity is 凸 (Pinyin , a bump) and concavity is 凹 (Pinyin āo, a dip).

So how can we intuitively remember and visualize the shape of a convex function? How can we reconcile the fact that this function is bulging out, with its actual shape? It's actually rather simple. Think of a laundry machine with a vertical cylinder. When the cylinder spins as in the drying cycle, the water surface will rise towards the edge of the cylinder due to the centrifugal force, and the water will be pushed out towards the edge. So the right interpretation of bulging out is not in the vertical direction, but in the horizontal direction! We simply interpret the function's value as the pressure or centrifugal force.

For those interested in seeing the shape of the water surface in a spinning tank, YouTube has several video clips. Here is one of the them: Centrifugal Force on Rotating Water Container. After seeing this video, and understanding the horizontal bulging out, you will probably never forget the shape of a convex function.

$\endgroup$
2
$\begingroup$

A line is said to "support" the graph of a function (or indeed, any subset of the Cartesian plane) if it "holds up" the graph: that is, the graph lies entirely above or on the line. (After all, gravity pulls downward!) We might think of the union of all support lines as the "ground" on which the graph lies; everything else--its set-theoretic complement--is the "sky".

A function of the real numbers is convex if and only if its graph is the boundary between the ground and the sky. This is a special case of the more general idea of convexity that applies to arbitrary planar regions, the same as the familiar distinction between a convex and non-convex polygon, for example. For arbitrary regions there is no definite "up" and "down" anymore, though, so we say that a line supports a region when the region lies entirely within one of the two closed half-planes bounded by that line. (Thus, the interior and boundary of a convex polygon form its "sky" and everything outside is the "ground.")

In short, calling a "concave upward" function "convex" unites two closely related familiar concepts and is justified by the universal earthbound human experience that gravity usually pulls downward.

$\endgroup$
4
  • $\begingroup$ This is a nice answer but is gravity really a necessary part of the explanation? It doesn't sound like it to me. $\endgroup$ Commented Dec 12, 2010 at 4:10
  • $\begingroup$ @Lao Tzu Given that this is psychology, not mathematics, there is something to be gained by appealing to innate human sensory processing skills. These include a strong up-down orientation and a highly refined ability to detect horizons (level, linear features). Both play a role in our conventions for graphing and describing functions. So gravity is not necessary but it is informative. $\endgroup$
    – whuber
    Commented Dec 12, 2010 at 16:35
  • $\begingroup$ From my perspective, the answer is more of a philosophy rather than a reliable answer. $\endgroup$
    – wayne
    Commented Mar 23, 2016 at 10:13
  • $\begingroup$ @Wangyan So is the question. $\endgroup$
    – whuber
    Commented Mar 23, 2016 at 13:54
1
$\begingroup$

Instead of thinking about the graph of $f: \mathbb{R} \to \mathbb{R}$ as a 2-D object in the plane, think about $f$ mapping one number line onto another.

drawing of a convex function from ℝ to ℝ

I'll return to that picture, just notice that there is plenty of "room" so to speak and that the arrows mapping point to image will never "fall back" on each other, overlap, or "crowd in" so long as the function is convex.

Imagine a closed loop in the plane whose interior is non-convex. It's like a deflated balloon. You need to "blow it up" until it's at least modestly ($\leq$) full of air for the interior to be convex. Similarly if you had a non-convex polygon and "blew air" inside of it, you would get a convex shape. So it's like convex shapes have to be sufficiently "inflated".

Similarly, the arrows in an $\mathbb{R} \to \mathbb{R}$-type picture like the above would be "flopping" inward onto each other, deflated if you will. In the convex mapping the arrows don't overlap at all -- they've got "pressure" or "energy" pushing them outward enough so that they don't overlap. So the image is properly inflated, if you will.

So @NateEldridge, the epigraph being a convex set is a red herring. Think about just the right-most point of a graph as it's being generated by a s-l-o-w graphing calculator. The image has to "outrun" the domain it comes from by $\geq$ each $dt$. And there you have your $f(\mathrm{interior\ of\ domain}) \leq \mathrm{image}_1 + \mathrm{image}_2$.

This is meant as an elaboration on @whuber's answer.

$\endgroup$
4
  • 1
    $\begingroup$ Can you make this a little more precise? I'm having trouble seeing how your interpretation applies to the function $f(x) = x^2$, which is convex, but whose arrows do "overlap" and "crowd in" around 0, unless I'm misunderstanding what you mean by those words. $\endgroup$
    – user856
    Commented Dec 12, 2010 at 12:36
  • $\begingroup$ You may have found an error in my explanation. Or maybe I need to add an "inverse interpretation" for $|\mathrm{input}|<1$. I do still believe that the $\mathbb{R}^2$-ish picture is a red herring, though. $\endgroup$ Commented Dec 13, 2010 at 1:02
  • $\begingroup$ Maybe I can steal from @T.. and say that "strictly increasing derivatives" corresponds to "blowing up the balloon" whereas derivatives that "lack the internal pressure" are "deflated" like the loop or balloon above. $\endgroup$ Commented Dec 13, 2010 at 1:10
  • $\begingroup$ Maybe a better way to say the above would have been to use the words "nonnegative curvature" $\leftrightarrow$ convex set, and "negative curvature" $\leftrightarrow$ non-convex. $\endgroup$ Commented Mar 18, 2011 at 10:57
1
$\begingroup$

I struggled with this too until I realised it was because I was so used to thinking of the x axis as the reference line I didn't question its relevance here. We learn integration mostly with positive valued functions in the top right quadrant.

Both the definition and the name have to work for any function, and of course there are infinitely many such functions in the bottom right quadrant (some even stay there) and relative to the x axis they are convex in the "intuitive sense".

So, in order to feel comfortable with the convention I find it helps to remember the definition uses the line between two points on the function as the reference for very good reason: this works consistently for all monotonically increasing curves, no matter where they are.

The x axis is a red-herring when thinking about function properties like this. Once you realise that and think in relative not absolute terms (i.e. don't draw the x axis in your mind), it's a lot easier to see that the name makes complete sense and is consistent for all functions.

Hope that helps!

$\endgroup$
0
$\begingroup$

This is what I tell my students. We know what a convex set is, and we need a name for functions satisfying the condition above. By (verbal) analogy we call them convex, too. But in that case the curve is the bottom (down) part of a convex region, so we can say that convex means convex down. But then concave up should equal convex down, i.e., the curve is the top (up) part of a concave region. Concave down then equals convex up, meaning that the curve is lower part of a concave epigraph or the upper part of a convex hypograph. Hope this helps!

$\endgroup$
0
$\begingroup$

Consider any plane simple differentiable loop. We say that this is convex if one can draw a straight segment connecting any two points, without leaving the "inside" of the loop. Convexity is just the name of this property, a way -if you like- to spend less time conveying the meaning.

Now, by Dini's theorem the support (or graph, the actual line in the plane) of your loop) is locally the graph of a function. Of course this can be either x = x(y) and y = y(x) so that one might have to be careful in rotating and reflecting the drawing in the former.

For simplicity's sake we will restric to the latter, y = y(x). Any convex loop you can draw will be, in these neighborhoods, 'smiling'. This is because if it started frowning, we could easily draw a line that 'breaks through' our loop.

The 'reason' why the upper side of the graph is chosen to have this property is basically a visual one: take the same convex loop as before and notice that the upper side of our loop always corresponds to the 'inside' of the loop, whether in neighborhoods of the form y = y(x) or x = x(y).

Remkark: this is obviously very simplistic and the definitions I give are not really canonical, but I thought it was a pretty argument from a very informal point of view.

$\endgroup$
0
$\begingroup$

One can think up reasons for "convex" to refer to the region above the graph, but all seem ad hoc, and "tweakable" so they refer to the region below it. We need to find the source, and her/his reasons. The most frequently used property of convex functions that I know of is Jensen's inequality. This was 1906. Presumably the source is several years before that, but I haven't found it.

$\endgroup$
0
$\begingroup$

I'm guessing it's a convention. Not everyone seems to follow it though, e.g. here taken from Bishops Neural Network for Pattern Recognition:

enter image description here

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .