Why do we use functional composition in the order we do?

Question

Function composition means, roughly, taking the output of a function and applying it to the input of another function. If we define an object C to represent this operation, we could say $C(f,g) = f∘g$ or $C(f,g)(x) = f(g(x))$.

Say we had a similar but slightly different operation $F$ (for “followed by”) such that $F(f,g) = g∘f$ or $F(f,g)(x) = g(f(x))$. We could make a symbol for this operation, maybe $→$, so that what we wrote before reads as $F(f,g) = f → g$. This could read as “$f$ followed by $g$” or “$f$ and then $g$”.

My suspicion is that $F$ is a much more natural and intuitive operation than $C$, since (in English) we read from left to right, so seeing $f*g$ feels more like $f$-then-$g$ than $g$-then-$f$. My second suspicion is that $F$ and $C$ are entirely equivalent and translating between use of one and the other is trivial. To my knowledge and experience, composition is a widely used operation, but followed-by is not. Why?

Are there certain branches of maths, specific proofs, or particular applications that benefit by using $C$ rather than $F$? Is anything known about the origins of this notation? Is followed-by actually used in some context I might not be familiar with? As followed-by is much more intuitive to me and makes certain problems a lot more obvious, might it be a good idea, every time I see $f∘g$ to replace it with $g→f$ or are there cases where that could lead me astray or make things more complicated?

I think of it more as working from the inside outward than as a left or right direction. — Sue VanHattum, Commented Feb 22, 2022 at 2:20
Maybe what I mean to say is that to me the notion of "after" is easier to wrap my head around than the notion of "inside", and I suspect that they are equivalent formulations — David Lalo, Commented Feb 22, 2022 at 2:39
Indeed. Reasonable question! And, indeed, some decades ago, some people (e.g., Israel Herstein) proposed to write the function symbols on the right, as in $((x)f)g$, so that our left-to-right reading would match the inside-out reading. But it didn't catch on... There are many more left-versus-right and outside-in-versus-inside-out parsing issues in math. Mostly of not-so-much mathematical content, but, now-and-then, a bit. — paul garrett, Commented Feb 22, 2022 at 2:39
... and, in fact, as in the situation of looking at group actions on sets, and then the induced action on functions on the sets, there is an inescapable left/right $g$-or-$g^{-1}$ issue. Similarly with linear algebra over non-commutative rings: (notationally) "right" modules over $R$ are "left" modules over "the opposite ring"... And Hom spaces and tensor products have such "problems". Mercifully, we seem not to have to worry so much about "top", "bottom", "front", "back", or other physical-configuration modules... though these are not entirely fake, I gather! Don't know, personally. :) — paul garrett, Commented Feb 22, 2022 at 2:44
Another little point that may clarify some aspects of this is to declare explicitly that the notation is a choice of language/notation, for a narrative about some mathematics that is itself not just whatever the notation is. The notation is "a book-keeping system", that is surely not unique, and could have deficiencies, etc., but those problems need not reflect on the underlying mathematical reality (if we believe there is one). :) — paul garrett, Commented Feb 22, 2022 at 2:54

Xander Henderson · Accepted Answer · 2022-02-24 20:28:41Z

The answer to the lede question "Why do we use function composition?" is "Because we do." It is the way that, for historical reasons, the notation developed. It is a convention which developed before a lot of set theory, modern algebra, and category theory came along, and this convention was deeply ingrained long before anyone thought to question it.

From a naïve point of view, I also think that it makes a good deal of sense: if $f(x)$ denotes the application of a function $f$ to an input $x$, then $g(f(x))$ must denote the application of a function $g$ to an input $f(x)$. Reading from left-to-right, we have "$g$ of $f$ of $x$", which we might abbreviate by $(g\circ f)(x)$.

From a more set theoretic point of view, this becomes a little harder to reason about. A typical schematic for functional composition in undergraduate texts looks something like the following:

Reading from left-to-right, we might expect the composite function to be "$f$ and then $g$", which we might want to denote by $f \ast g$ or something similar. But the convention is that this map is $g \circ f$—for historical reasons, maps are applied from right-to-left.

For the most part, this doesn't really cause any problems, but there are some places where you might see some difficulties or a different choice of notation:

Category Theory

In chalk-talks, I have seen one or two category theorists use the notation $(x)f$ to denote the image of $x$ under the map $f$. In this setting, $$ (x)(f\circ g) = ((x)f)g, $$ which is what we would usually write at $g(f(x))$. So, if you want to compose on the right (as seems "natural"), you should also apply functions on the right.

From one point of view, I suppose that this is a perfectly reasonable convention to adopt (and I kind of wish that we had adopted this convention in the first place), but it is probably a very bad idea to teach this convention to a neophyte.

Differential Operators

Folk who study differential equations will often use notation like $\partial_{xy}$ in order to mean $$ \partial_{xy} = \partial_y \circ \partial_x, $$ where $$ \partial_{x} f(x,y) = \frac{\partial}{\partial x} f(x,y). $$ Or should that be $$ \partial_{xy} = \partial_x \circ \partial_y?$$ There is room for ambiguity here, and authors are well-served to be explicit about what they mean. On the other hand, in most "nice" cases, the derivatives commute, so maybe it doesn't really matter.

Iterated Function Systems

In my own area of research, a common object of study is an iterated function system. Such a system is simply a collection of maps $\{\varphi_j\}_{j=1}^n$ which map some space into itself. The maps are often composed with each other over and over again (hence the "iterated" part of the term), which leads to all kinds of wild notations for composition. One common notation is to take $\mathbf{j} = (j_1, j_2, \dots, j_k)$ to be a tuple of indices, and then write $$ \varphi^{\mathbf{j}} = \varphi_{j_k} \circ \varphi_{j_{k-1}} \circ \dotsb \circ \varphi_{j_2} \circ \varphi_{j_1}. $$ That is, first apply the map $\varphi_{j_1}$, then apply the map $\varphi_{j_2}$, and so on. However, people mess up this convention all the time. I cannot tell you the number of times that I have seen the convention defined one way, and then applied the other. This doesn't usually cause problems, because the underlying reasoning is generally correct—it is just a problem of transcription—but it can be a concern.

Reverse Polish Notation

I grew up using an HP48 series calculator. These calculators were somewhat unique, in that they implemented computation via reverse Polish notation. The idea is that the arguments of a function are placed onto a "stack", and functions apply to the items on the stack in a "last in, first out" order. So, for example, if I wanted to compute $\sin(\log(5) + 3)$, I might enter the keystrokes

[5]     <- this puts 5 on the bottom of the stack
[log]   <- the "eats" the 5, and puts log(5) on the bottom of the
           stack
[3]     <- this "bumps" log(5) up a spot, and puts 3 on the stack
           below log(5)
[+]     <- this "eats" the bottom two elements of stack, adds
           them together, and puts the result on the bottom of
           the stack
[sin]   <- this "eats" the bottom element of the stack, takes its
           sine, and places the result on the bottom of the stack

Written out, the set of keystrokes is "composition in the reverse order", i.e.

5 log 3 + sin

roughly corresponds to writing $(((5)\log, 3){+})\sin$, which kind of looks like backwards composition. Using the standard notation we might write $\sin( {+}(\log(5), 3))$. In both cases, we think of $+$ as a function $+: \mathbb{R}^2 \to \mathbb{R}$.

This may look arcane at first glance, but once you get used to it, it is a really convenient way to enter computations into a calculator or computer. It eliminates a lot of errors caused by missing parentheses.

TL;DR

The current convention exists for historical reasons. If we were to start all over again, we might adopt a different convention, and there are good reasons to want to do so. However, the convention exists, is well-established, and you would be doing your students a disservice to introduce an alternative notation.

I would add a "folk history" to this: probably we had particular functions first, like sine. Since we would natural write "sine of x", abbreviating this $\sin(x)$ would make sense. Then this convention would naturally lead to $f(x)$ for general functions. As soon as we are writing $f(x)$ rather than $(x)f$, it makes sense to write function composition in the usual order. This is my best guess as to how we ended up where we are. I do wish Herstein's attempted notational revolution had caught on. — Steven Gubkin, Commented Feb 23, 2022 at 11:29
Great answer, love these examples! And not just because I've also studied Iterated Function Systems :-) — Brendan W. Sullivan, Commented Feb 23, 2022 at 19:58
For the slightly less adventurous category theorists, it's also somewhat common to write $f(x)$ as usual but use a semicolon, as in $f;g$, to denote composition "the other way around". — Daniel Wagner, Commented Feb 24, 2022 at 20:07

Matthew Daly · Accepted Answer · 2022-02-22 04:33:07Z

14

In general applications, defining the evaluation rule of $f\circ g$ by $(f\circ g)(x)=f(g(x))$ has a lower extraneous cognitive load than the way you suggest. As a sample of how inconveniencing that can be, go ahead and proofread the two following sequences of equations to see if there is a mistake in my exposition.

$$((f\circ g)\circ h)(x)=h((f\circ g)(x))=h(g(f(x)))=(g\circ h)(f(x))=(f\circ(g\circ h))(x)$$

$$((f\circ g)\circ h)(x)=(f\circ g)(h(x))=f(g(h(x)))=f((g\circ h)(x))=(f\circ(g\circ h))(x)$$

Esoteric rules of symbolic manipulation aren't bad in and of themselves, especially if it ultimately leads to a clearer understanding of the underlying mathematical structures. But I don't think that's the case here: as far as I can see, all you get is something that is easier to translate into English step-by-step. But IMO, the strength of mathematical symbolic language is that it offers efficiency and clarity away from the vagaries of natural language.

That being said, there are subfields of mathematics where that cognitive load is more intrinsic than extraneous and therefore. For instance, when studying permutation groups, it is unclear whether the product $fg$ of two permutations is most naturally understood as "do $f$ first and then $g$" or symbolically as $(fg)(x)=f(g(x))$. As the comments suggest, I am given to believe that some old-school authors in abstract and linear algebra used postfix functional notation to establish the clarity.

answered Feb 22, 2022 at 4:33

Matthew Daly

5,6291 gold badge12 silver badges45 bronze badges

1

$\begingroup$ I guess this example was helpful in that making me work through an example, shows that there is an inconsistency since function application (ie the putting of inputs into a function) happens on the right whereas thinking of nested function application as a series of sequential steps we'd expect input on the left $\endgroup$
– David Lalo
Commented Feb 22, 2022 at 8:11
1

$\begingroup$ @DavidLalo that notation would be inconsistent: does $f\to g$ mean you use $f$ as the argument to $g$, or compose it before $g$? However, you could use something like $x\to(f;(g;h)) \equiv x\to ((f;g);h)$. In functional programming languages you often have both directions available as different operators – in Haskell, h(g(f(x))) can also be written as h $ g $ f x or (h . g . f) x (the . corresponds to the \circ composition operator), or x & f>>>g>>>h. $\endgroup$
– leftaroundabout
Commented Feb 23, 2022 at 15:33
1

$\begingroup$ hmmm interesting. I think I was thinking of something similar to currying a partial function and the order of operations would be determined by parens (although I think the associativity of function application, would make these unnecessary?) My thinking is it's contextual -- f → g is inherently unevaluable (whereas x → f → g, is an element in codomain of g) but represents the map that starts in the domain of f and ends in the codomain of g. But I need to chew on Your point a bit more to really get it. $\endgroup$
– David Lalo
Commented Feb 23, 2022 at 16:28
1

$\begingroup$ In a book on intro to category theory I once started reading they treat inputs x to functions as maps from a one-element set to domain of x. Thinking of it this way I think might remove some problems of "an element of a set and a function are fundamentally different things so why are we using the same operation on both in the same way" $\endgroup$
– David Lalo
Commented Feb 23, 2022 at 16:31
2

$\begingroup$ In programs, functions are usually procedures to execute. Left-to-right evaluation is easier to follow. In javascript some propose an operator |>. x |> f means f(x). If you replace f(x) by x▹f and f◦g with g▹f, it becomes beautiful. (x▹f)▹g = x▹(f▹g). Consider x to be the parameterless function that returns the value of x (as David mentions). $\endgroup$
– Florian F
Commented Feb 24, 2022 at 17:36

| Show 7 more comments

Acccumulation · Accepted Answer · 2022-02-22 21:59:09Z

Put in natural language terms, if your language has verb-object order, then verb1-verb2-object naturally represents verb1 (verb2-object). That is, apply verb2 first, then verb1. For instance, if you're writing a recipe, "cook the chopped onion" means "chop the onions, then cook them". If you want someone to peel the onion, then chop it, then cook it, then serve it, that would be "serve the cooked chopped peeled onion". If your language has object-verb order, then it's natural for composition to go the other way. Whatever direction your object to verb order is, that's your composition order, because the verb+object becomes the object for the next verb. Math is somewhat inconsistent as to which order is used; "x squared plus two" is basically object-verb order, while "f of x" is verb-object order.

I think in English, in everyday use, those prenominal modifiers would normally (if at all) be in temporal order, thus: "serve the peeled, chopped (and then) cooked onion(s)". — Pablo H, Commented Feb 23, 2022 at 15:27

Andrey Tyukin · Accepted Answer · 2022-02-24 21:04:32Z

I think it's worth noting that $g\,;f$ is sometimes used instead of $f\circ g$ in certain areas that have connections to computer science:

In category theory: $f\,;g$ will sometimes be used for the composition of morphisms $f: A \to B$, $g: B \to C$, see for example this introductory text
In computer science proper: Again, $g\,;f$ might be used for composition of functions / partial functions / relations. For example: $[\mathit{-}]c[\mathit{-}] := post(c)\,;\supseteq$ (from here).

The main reason why it's not used more widely is simply due to convention: Had the mathematicians assumed some kind of postfix / "method notation" (e.g. $x.f.g.h$) instead of $h(g(f(x)))$, then the order $fst ; snd$ would probably seem more natural than $snd \circ fst$.

Regarding the prefix notation for functions, I believe that we can safely blame that on the early physicists. This is because in physics, one often has to work with some "important" quantities $F$, which depend on the "less important" parameters $a, b, \dots, z$, so that the $F$ comes first in an expression $F(a, b, \dots, z)$. You can see that the parameters really are "less important" from the fact that we often encounter abuse of notation where we are freely jumping between e.g. $F(x, y)$ and $F(r,\phi)$, where the "important quantity" $F$ is denoted by the same symbol, whereas the "less important" parameters don't even range over the same domain.

Stack Exchange Network

Why do we use functional composition in the order we do?

4 Answers 4

Category Theory

Differential Operators

Iterated Function Systems

Reverse Polish Notation

TL;DR

Not the answer you're looking for? Browse other questions tagged
mathematical-pedagogy
notation
functions
or ask your own question.

Hot Network Questions

Why do we use functional composition in the order we do?

4 Answers 4

Category Theory

Differential Operators

Iterated Function Systems

Reverse Polish Notation

TL;DR

Not the answer you're looking for? Browse other questions tagged mathematical-pedagogynotationfunctions or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
mathematical-pedagogy
notation
functions
or ask your own question.