19
$\begingroup$

EDIT (two years later): I was saddened to realise that no-one seems to care at the school level. Everything I thought might be a problem ended up as a non-issue because no-one challenged anything. The word limit is used in class but no one stops to ask what is actually meant by it.


Background: I am new to this site, but have 1500 reputation on the main Maths Stack. I am (age-wise) a secondary student of maths, but for a very long time have been informally learning at home and been studying linear algebra, calculus and formalism that goes far beyond the A-level and Further Maths curriculums. “A-level”, for non-UK readers, means people from 17-18 years old, just starting to learn calculus but without any real formal background - I’ve seen the textbooks and they really just state facts and don’t explain them! I have been helping my friends with their maths for a long time, and have acted as an informal tutor to them. I’ve recently taken up tutoring work for work experience, and I hope to start tutoring A-level students online soon. That may look very pretentious and arrogant, but for context my teachers said I didn’t even need to turn up to A-level class when I formally “begin” the A-level next year.

Question:

Anyway, I am not a veteran teacher by any means. After the summer, when I’m tutoring for money or helping my friends, inevitably the courses will get to derivative calculus.

But when I was studying introductory calculus, I remember being extremely aggravated by not being told the reasons for things. In particular, the thing that really got me and held back my understanding for at least a month or so was: “but aren’t these calculus mathematicians just constantly dividing by zero? How can anyone call this formal! It’s sloppy maths!!” And while that is obviously a false statement that only a naive student would make, I couldn’t get over it - until I did. I can’t remember exactly what helped me: the idea that works for me now is the picture of the derivative as a limit, and the epsilon-delta definition is my friend here, but importantly it will not be taught to my friends or tutees during their time at secondary school!

So, I am dreading the day that one of my observant friends or observant tutees asks: “but aren’t we just dividing by zero?” because I can’t give a formal response, and I have zero proper educational background so I don’t know what the intuitive response should be, that actually works for pupils. I could point them to 3 blue 1 brown, but he does step into formalism here and there which is excellent for the students' learning and general appreciation but it is potentially scary for students just being introduced to calculus. Does anyone have any ideas, experience-backed, for how to take a curious student past the “derivative paradox”? It stumped my learning for a long time and for an A-level student with hardly any time at all that could be disastrous, and I don’t have the heart to tell them to just bottle up their curiosity and plough on with the course - that isn’t maths, that’s oppressive textbook education!

As a concrete example (but I invite the answerer to not focus on solely this example):

The power law for integer powers (and for non-integers too, but the base A-level doesn't cover the Taylor expansion of the binomial theorem) is very easily proven like this (this is a proof I thought of myself a while ago, I hope it's correct!)

$$\begin{align}\lim_{\delta x\to0}\frac{(x+\delta x)^n-x^n}{\delta x}&=\lim_{\delta x\to0}\frac{\sum_{k=0}^n{n\choose k}(\delta x)^k\cdot x^{n-k}-x^n}{\delta x}\\&=\lim_{\delta x\to0}\frac{(x^n-x^n)+n\cdot\delta x\cdot x^{n-1}+o(\delta x)}{\delta x}\\&=nx^{n-1}+\lim_{\delta x\to0}\frac{o(\delta x)}{\delta x}\\&=nx^{n-1}\end{align}$$

Now my younger self would have had several quibbles with this: "why is $(\delta x)^2$ "smaller" than $\delta x$? $0$ is just $0$, right? And why are we doing maths where there are zeros on both the numerator, and the denominator, that's undefined,... right? What do you mean by vanishingly small?"

And I am personally a great believer in trying to prove things for oneself when learning; whenever I hear a Wikipedia article or a YouTube video state some fact, I always try to prove it for myself before continuing with the article/video. This is very rewarding, as it hones my formalism and understanding, and I would just love to ask a friend or tutee to prove the power law of derivatives using what they know about the binomial expansion and walk them through it but still leave the burden of proof with them... but I don't think I can, for A-level, judging by what I've seen of the A-level textbooks and so on. There are just too many unanswered quibbles!

Any advice is appreciated here. I want the best for my friends/tutees, and unanswerable quibbles due to the necessary evils of standardised, institutional education could damage their motivation/confidence that maths actually works.

To be very very precise: my question is about how to specifically soothe a curious student quibbling about the derivative “paradox” in the same way that I once (detrimentally, with no one to guide me) did. I’m not interested in general ideas of whether we should or shouldn’t push rigour into introductory calculus.

$\endgroup$
28
  • 12
    $\begingroup$ Also, the limit definition of the derivative and the proof that the derivative of $\sin$ is $\cos$ are on the A-level syllabus. $\endgroup$
    – A. Goodier
    Commented Jul 28, 2021 at 12:37
  • 12
    $\begingroup$ You may be amused by the 1734 critique of calculus by the bishop George Berkeley (yes, that Berkeley) called "The analyst, or, a discourse addressed to an infidel mathematician." A lot of the complaints he has are similar to the ones you had as a student, and the text is also freely available. quod.lib.umich.edu/e/ecco/004796094.0001.000/… $\endgroup$ Commented Jul 28, 2021 at 12:53
  • 4
    $\begingroup$ Take a look at this blog post by Gowers about A-level maths. The comments to the blog are also very interesting! gowers.wordpress.com/2012/11/20/… $\endgroup$
    – user52817
    Commented Jul 28, 2021 at 13:37
  • 18
    $\begingroup$ @JochenGlueck: I think OP's fellow 18-yo students' problems are at an immensely lower level then you're thinking about. These are people who were just introduced to the word "limit", haven't grokked any meaning to it, and when their eyes see $lim_{a \to 0}$, their brain reads it as $a = 0$. $\endgroup$ Commented Jul 28, 2021 at 15:37
  • 9
    $\begingroup$ @FShrike most of my students accept what they are taught and have little interest in knowing things beyond the scope of the course. I have had a few students who really desire to master the material deeply, and all of these have been Further Maths students. $\endgroup$
    – A. Goodier
    Commented Jul 28, 2021 at 15:45

20 Answers 20

32
$\begingroup$

There is no royal road to geometry. - Euclid

Nor calculus.

The essence of calculus thinking is really the limit concept. One needs to wrap one's mind around that. Formally: it's the core technique that defines derivatives and integrals. Poetically: it's the eye-of-the-needle through which you must pass to get to the next level of mathematics.

There are many educators nowadays who argue that epsilon-delta definitions should be left aside in starting calculus courses, and I'm left perpetually agog at that trend. All of the (rather basic) calculus texts I've ever seen include the (absolutely essential) definition in their introductory chapter on derivatives. To skip over that is just shooting oneself in the foot.

So your options are basically just these: (a) teach limits and epsilon-delta proofs so as to digest the real definition/explanation, or (b) hand-wave it and leave it in perpetual ineffable mystery. There's really no other path forward.

(I like the Stein/Barcellos text Calculus and Analytic Geometry (1992) that I taught from in college; the sequence highlights this emphasis, and everything seems clearly laid out -- but I guess it's not one of the "big names" and maybe not radical enough to make a huge splash.)

$\endgroup$
15
  • 4
    $\begingroup$ Do you have any advice on how to teach epsilon delta to students who have never really seen mathematical language and rigour of that kind before? I could happily recite it but then again to me it’s all plain English because I understood its meaning long ago... I wouldn’t know which bits require deciphering, and how $\endgroup$
    – FShrike
    Commented Jul 28, 2021 at 16:49
  • 9
    $\begingroup$ My son is now an engineering student at a university, and I get to provide some help here and there. I personally find that there is way too much emphasis placed on the epsilon-delta proofs in his calculus class. There is a dichotomy of the requirements for engineering majors and for math majors, and the math requirements are what is taught. After 40 odd years a PhD career across physics and engineering, most of what is being taught was never actually used. And a bunch of useful stuff is glossed over. $\endgroup$
    – Jon Custer
    Commented Jul 29, 2021 at 17:29
  • 5
    $\begingroup$ @JonCuster what are examples of "useful stuff [that] is glossed over" in the calculus course? $\endgroup$
    – KCd
    Commented Jul 29, 2021 at 21:21
  • 3
    $\begingroup$ I think the reason that educators resist teaching epsilon-delta is because it unavoidably requires the student to understand nested logical quantifiers ("for every epsilon, there is a delta..."), and most students get to this point without learning what a "quantifier" is at all. You're suddenly introducing new variables which are not x or y, and the student only has a very loose understanding of where variables "come from" in the first place, so it all feels very foreign to their prior education. One might try to avoid this by skolemizing the formula, but I fear that would make it worse. $\endgroup$
    – Kevin
    Commented Jul 29, 2021 at 21:56
  • 5
    $\begingroup$ Infinitesimal calculus is both consistent and proves/generates "the same" results as epsilon-delta based calculus at the high school level. You can learn and do calculus without using epsilon deltas, even doing it with rigor, via the path of nonstandard analysis/calculus. en.wikipedia.org/wiki/Nonstandard_calculus $\endgroup$
    – Yakk
    Commented Jul 30, 2021 at 16:06
31
$\begingroup$

I teach calculus at a community college in the U.S. (2 year college, from which many students transfer to a university). I explain limits from about day two in an informal way ("h gets infinitely close to 0"), and talk about the problems with saying infinitely close (but I keep saying it...). I tell students that our learning journey will match the journey made by mathematicians. The 150 years it took between Newton/Leibniz and the precise definition of limits will take us about 6 weeks. I keep saying infinitely close (but not equal), and keep wondering with them how that can be.

You said your example wasn't the important part, but I see a few problems with it pedagogically. Using two letters for one variable (𝛿𝑥) can be confusing for students. I have most often seen h used for this, and since I'm not from the U.K., I thought I'd check. I googled something like 'derivative from definition' and found this video. It looks like it might help you think about this.

Also, I hope you would not use the combinatorics notation with students. It looks pretty scary. It helps to keep to one scary new thing at a time. When I teach this, we multiply out (x+h)^2, (x+h)^3, and (x+h)^4. I ask students to look for patterns, and after those three examples they generally see that we always start with x^n + n*x^(n-1). That's all you need. The rest has an h in it that can be factored out. And since h does not equal 0, the common factor can be canceled. This makes sense, even without the epsilon-delta definition.

For your own interest, you might want to check out non-standard analysis. (Berkeley was wrong, Newton's infinitesimals could find a firm foundation. Instead of using limits, we expand our number system.) It doesn't help students to understand the basics, but it's kind of mind-bending.

$\endgroup$
10
  • 3
    $\begingroup$ When I first learned calculus in Australia (in year 11) we used $h$ and $k$. At around the same time in Physics we were using $\Delta x$ and $\Delta t$ to define velocity, so when we eventually started using $\delta$ notation it wasn't hard. I don't remember seeing or teaching the formal definition of a limit or of continuity until uni. $\endgroup$
    – Peter
    Commented Jul 29, 2021 at 12:33
  • 2
    $\begingroup$ If anything, I think you're under-selling infinitesimals. They are used constantly in the sciences and engineering, so it's not as though they're some exotic topic that students could be shielded from. They also don't require advanced knowledge to understand. I have a freshman-level treatment here lightandmatter.com/fund . See the index under "infinite and infinitesimal quantities." $\endgroup$
    – user507
    Commented Jul 29, 2021 at 13:39
  • 4
    $\begingroup$ Many students find infinitesimals far more intuitive than limits. This is why students evaluate 0.999... as 1-ε (which is quite reasonable if one is working in hyperreals instead of reals). The downside of working with infinitesimals is that they aren't part of ℝ , so any student who is taught about infinitesimals must also be warned that they can't use them (i.e., because they should use limits instead). $\endgroup$
    – Brian
    Commented Jul 29, 2021 at 14:38
  • 4
    $\begingroup$ (+1) for (among other things) When I teach this, we multiply out (x+h)^2, (x+h)^3, and (x+h)^4. I ask students to look for patterns, and after those three examples they generally see that we always start with x^n + n*x^(n-1). That's all you need. In fact, you don't have to multiply all these out to get the pattern. For instance, $(x+h)^4 = (x+h)^3(x+h) = (x^3 + 3x^2h + \cdots)(x+h) = x^4+x^3h+3x^3h + \cdots = x^4 + 4x^3h,$ and using this we can get $(x+h)^5=x^5+5x^4h + \cdots$ in the same way. This is the basis for a proof by induction, but no need to go there at this level $\ldots$ $\endgroup$ Commented Jul 30, 2021 at 18:36
  • 1
    $\begingroup$ For those who ask how we know for sure the pattern continues, you can tell them (after class) to consider something like $(x^8 + 8x^7h + \cdots)(x+h),$ and notice how you get $x^9 + 9x^8h + \cdots,$ and maybe mention induction if they had it in precalculus (or just point out that the 8th power to 9th power basically shows that if it works up to some point, then it must also work just afterwards -- why? Try replacing $8$ and $9$ with $37$ and $38,$ or with any two consecutive positive integers). A vertical arrangement of the terms in which like terms are in the same column works best. $\endgroup$ Commented Jul 30, 2021 at 21:44
19
$\begingroup$

We routinely get questions related to pushing more rigor in early calculus. Usually from outstanding students and based on sample of one I like it that way logic.

There's a reason why things are the way they are. And that's because most students would get the opposite of a benefit pedagogically by emphasizing increased rigor in early calculus. It's not rationales holding people back. But lack of drill, algebra and trig working ability, etc.

For a weak student, you would be better off to help them by just doing more drill. That builds familiarity. To the extent you discuss concepts, analogies like tangents and the like will help the more.

Really truly. It is not lack of epsilon delta holding back someone from integration by parts. It us lack of practice. Of something new.

Humans are not rule based computers. They are imperfect systems that need practice.

$\endgroup$
5
  • 5
    $\begingroup$ I'd say that drill and practice are techniques to attempt to turn humans into rule based computers. However, rule based computers don't understand math, they can just execute it. So while lack of practice may hold someone back from executing integration by parts by following the rules, practicing a rule without understanding it can make it harder to understand later. $\endgroup$
    – Eph
    Commented Jul 30, 2021 at 13:08
  • 4
    $\begingroup$ Sure. However, it is common to understand things on different levels. Do kids doing 1 + 1 need the intuition of Terry Tao building the real numbers? Furthermore intuition, feel, can be a very different thing than rigorous proof. But of course the more separate frames you have for something, feel, proof, drill, the better. And note that this fellow is building a problem for solution based on what he speculates students will find lacking. Speculates, not observes. That basically makes it one more, I would have liked more rigor so let's do that with everyone despite how atypical I am, post. $\endgroup$
    – guest
    Commented Jul 30, 2021 at 14:08
  • 2
    $\begingroup$ This answer seems to have nothing to do with the question. The OP describes a student who is curious about this issue and wants a straight answer. $\endgroup$
    – user507
    Commented Jul 30, 2021 at 14:44
  • 8
    $\begingroup$ A student he has never actually encountered before. Plus speculation that this is likely he will. And my point is that it's a hammer looking for nails. In a screw world. $\endgroup$
    – guest
    Commented Jul 30, 2021 at 17:25
  • $\begingroup$ While it is true that things are the way that they are for a reason, assuming that the reason that they are that way is a good reason is begging the question. Sometimes things are the way they are pedagogically for a bad reason (e.g., an imaginary example: my teacher did it this way, because their teacher did it this way, because their teacher did it this way, because … that teacher lost their notes that day and improvised a half-remembered version of what they'd meant to discuss). $\endgroup$
    – LSpice
    Commented Oct 20, 2022 at 23:52
12
$\begingroup$

Start with a numerical example. Say you want to find the gradient of the tangent to $y=x^2$ at $x=1$. Obviously the point itself is $(1,1)$. Pick a nearby point, say $x=1.1$. A moment with a calculator shows $y=1.21$ and the gradient of this chord is $0.21/0.1=2.1$. Now pick a closer point, $x=1.01$. We again use the calculator to find $y=1.0201$ and the gradient of this chord is $0.0201/0.01=2.01$. And so on. The subsequent terms are $2.001$, $2.0001$, $2.00001$, ... which are clearly approaching $2$.

Once you have built up a bit of numerical intuition, so that they can see that as the other point gets closer and closer to the tangent point, the gradient gets closer and closer to $2$, then introduce algebra, with $x$ and $\delta x$ in place of $1$ and $0.000...1$. Match each algebraic term to the corresponding numerical series. That way, they can see some terms stay the same and others rapidly vanish into tinier and tinier corrections. Then state that there is some mathematical machinery needed to make this "limit" process formal and rigorous, which they will be introduced to later, but for the moment they should accept the intuitive proposal that as the two points get closer together, the chord gets closer to the tangent line, and you can safely assume that those terms shrinking towards zero can be dropped. Indeed, you might need to be careful that they don't go too far the other way and regard it as "just obvious". Mention the possibility of sharp corners, and they can probably see how that would go wrong without you having to go into details.

Once you have established it for a simple case like $y=x^2$, then do a few more simple polynomial cases using the same numerical-then-algebraic approach, you should be able to go through the steps more quickly, then check to see if they're happy to move to a purely algebraic approach. At any time later if they feel unsure, they can always just stick in a tiny number like $0.000001$ and see what happens.

$\endgroup$
3
  • 3
    $\begingroup$ Yes, numerical examples are good, thanks $\endgroup$
    – FShrike
    Commented Jul 28, 2021 at 22:34
  • 9
    $\begingroup$ Moreover, the expression inside the limit definition of the derivative really is exactly the formula that calculates the slopes of those secant lines. This can help illustrate that the expression really is meaningful and doesn't (in my opinion) actually suffer from any zero-divided-by-zero paradox—to me thinking of that as a paradox is a result of not yet understanding that a limit is more general than plugging into a continuous function. $\endgroup$ Commented Jul 29, 2021 at 0:44
  • 1
    $\begingroup$ @GregMartin I don't think of it as a paradox, I just used to, and I find it likely that my classmates-to-be in a few months time may well find it paradoxical, since limits won't be explained in depth to them $\endgroup$
    – FShrike
    Commented Jul 29, 2021 at 9:43
8
$\begingroup$

I admit that I'm unable to follow the proof you give as an example in your question, but am I correct when I assume that your question simply wonders how to reconcile $dy/dx$ with the fact that $dx$ approaches zero — and hence is considered zero by your students?

Then I'd simply explain the limit operation visually by exploring a curve. This can be done on a blackboard. The limit operation is essentially a deep zoom into the curve at a given x, much like a zoom into the Mandelbrot set. I'd visually present this as a succession of increasing detailed looks through a symbolic "magnification glass". And all differentiable curves have one thing in common: When you zoom in deeper and deeper, the detail becomes a straighter and straighter line. The inclination of the emerging straight line at that point is the derivative. That the result becomes more and more exact the deeper you zoom in is obvious, and the mental jump to saying "the virtual end point of this zoom is the result we are looking for" is quite intuitive.

Then you have a wonderful motivation and opening to present the mathematical tools which perform exactly that process in a rigorous and proven fashion. The rigorous mathematical toolkit of calculus is necessary because intuition leads easily astray with infinite values. Establishing firm theoretical ground is one of the great achievements of humanity.

The crucial point is to understand a limit operation as a process, not a static value.

When, for a differentiable function $f$, we write

$f'(x) = \lim \limits_{\Delta x \to 0} \frac{f(x+\Delta x)-f(x)}{\Delta x} = \frac{dy}{dx}$

we state that, when we make $\Delta x$ smaller and smaller, the fraction gets closer and closer to a finite value. The limit notation describes the dynamic behavior of the formula when the variable $\Delta x$ becomes smaller and smaller. The limit its value approaches is not the result of setting $\Delta x$ 0; it is the vanishing point of an operation that does not have an end. We can prove that this vanishing point exists and which value it has, but it is not reached for any value of $\Delta x$, least of all for 0. The meaning of the formula is that we can get as close as we like when we make $\Delta x$ smaller again — a process.

$\endgroup$
8
  • 1
    $\begingroup$ I will include that in my "hand-wavey" repertoire for when less formal explanations are good enough! $\endgroup$
    – FShrike
    Commented Jul 29, 2021 at 9:53
  • 3
    $\begingroup$ @FShrike It was not meant to be hand-wavy: It was meant as a depiction that can be intuitively grasped -- and is entirely correct! -- of the mathematical tools introduced in the same course. The "magnification" is exactly what happens when one performs the limes operation. $\endgroup$ Commented Jul 29, 2021 at 10:04
  • 3
    $\begingroup$ What is a "limes operation"? $\endgroup$ Commented Jul 29, 2021 at 23:12
  • 1
    $\begingroup$ At first I wondered if "limes" was simply a typo for "limit". Now I'm not so sure - could you please clarify further? $\endgroup$
    – J W
    Commented Jul 30, 2021 at 11:24
  • 1
    $\begingroup$ @JW Sorry for the confusion -- it is indeed the limit. In German we say limes (the Latin word); the nice thing is that it is not conflated with other limits but used specifically for $\lim$. I thought that naturally that's the case everywhere but English has limit, so that is what's used. $\endgroup$ Commented Jul 30, 2021 at 11:34
7
$\begingroup$

First off - I 100% agree with Collins here that there's no shortcuts. To really understand how derivatives work, you need to learn the epsilon-delta definition. But it's my sense that you're not really shooting for that here - it sounds like what you want is a way to convince a student, not necessarily a way to prove it to them. The difference can be hard for someone advanced in math to spot, because you're so used to proving things that you've forgotten that you once didn't know how!

When it comes to derivatives: the funny thing is, most students (I speak from the perspective of the US math curriculum here) are actually already comfortable with one example of dividing by zero - removable discontinuities, or "holes". For example, the function

$$\frac{x(x + 2)}{x}$$

has a "hole" in it at $(0,0)$, because $0/0$ is undefined. But this is probably very much a "well, technically" sort of situation for them - technically it's not defined at $x = 0$, but look, you can just cancel the $x$'s and it works just fine, filling the hole. Thus, in this case, it makes sense to say that what was $0/0$ turns out to be $2$.

You can move from there into a convincing explanation of why derivatives work: we're not dividing by zero, we're filling a hole! The trick is to find a way to write the function that admits the cancellation you need. Of course, that isn't always easy or even possible with pre-calculus techniques; but in my experience most students are prepared to accept "it makes sense in this case, and it will make sense in these other cases once you learn a bit more" as a convincing argument.

$\endgroup$
4
$\begingroup$

This might be opening up another can of works but have you considered introducing infinitesimals? Like imaginary numbers, they are created by appending a new element to the reals, but instead of root -1 the element added is smaller than any positive real number but greater than 0, effectively the "dx" from the integral. Calculus can be consistently derived from this as well as from limits, and to some people it might seem more intuitive.

On the other hand to others it might just be an additional source of confusion, but it might at least help you feel better about handwaving justifications

$\endgroup$
3
  • 5
    $\begingroup$ I would, but ... I have not undertaken a formal study of infinitesimal calculus. I have only read articles on hyperreal and surreal number systems very briefly, and it would be disingenuous to try and teach from that faulty basis. Thanks for the thought though - one day I’ll study infinitesimals properly $\endgroup$
    – FShrike
    Commented Jul 29, 2021 at 0:08
  • 2
    $\begingroup$ Infinitesimals are used constantly in the sciences and engineering, so it's not as though they're some exotic topic that students could be shielded from. They also don't require advanced knowledge to understand. I have a freshman-level treatment here lightandmatter.com/fund . See the index under "infinite and infinitesimal quantities." One doesn't need to know about the hyperreals/surreals in order to understand how to calculate with infinitesimals, any more than a kid at age 10 needs to understand the completeness property of the reals in order to know how to calculate with the reals. $\endgroup$
    – user507
    Commented Jul 29, 2021 at 13:42
  • $\begingroup$ The thing is, they are exotic if you decide to give a rigorous exposition. If you're OK with an informal treatment, then I agree that nonstandard analysis has its advantages. But if your students wish to continue study mathematics, it would be easier for them to have the standard background than to switch from nonstandard analysis at some point. $\endgroup$ Commented May 31, 2022 at 11:06
4
$\begingroup$

The only way I found to explain this is with infinitesimals. It's not that $dX$ is vanishingly small... its that it is "infintessimally small."

My usual approach to explaining all of this is to start with Zeno's paradoxes. His most famous is good enough -- the idea that you can't run to the end of a football field without first running to the half way point, and you can't get there without running to the quarter way point, and so on. One might argue that this shows motion is impossible, for it would require infinitely many steps! Really all it does is show that this way of thinking is insufficient to match our physical reality.

Calculus was originally called "the calculus of infintessimals." Calculus simply means "a procedural way of calculating something," but this particular calculus was so astonishingly valuable that it croweded out the term, and now we just refer to it as "calculus." It turns out that is mighty difficult to come up with a system of mathematics which handles infintessimals and is consistent. Many have tried. Newton and Liebnitz were the first to come up with a formal way of dealing with infintessimals which was consistent.

We see quickly that such a calculus is a resolution to Zeno's paradox, defining a meaningful way to add up an infinite number of infintessimally small segments. And that, I find, is key to understanding why we do what we do in calculus. The purpose of calculus was to solve equations that could only be solved if we could have an infinite number of steps. To do that, we created limits, derivatives, and integrals, all of which are consistent ways of allowing infintessimals to coexist meaningfully with normal numbers.

$\endgroup$
2
  • $\begingroup$ I'm glad I'm not alone in this; I always intended to introduce calculus to tutees and friends struggling with limits with Zeno's paradox, among other things. It is a useful mental exercise, and solvitur ambulando shows that there isn't really a paradox at all $\endgroup$
    – FShrike
    Commented Jul 29, 2021 at 9:48
  • 1
    $\begingroup$ The approach I like to resolve Zeno's paradox is to define the goal point as having two key characteristics: 1. the runner can never reach any point beyond it after some time T (the runner might have been there at some earlier time, e.g. before the "race" started, but that doesn't count), but 2. It's impossible to pick any point on the starting side of that point which the runner won't be beyond at some time after T. Only one point can meet this criteria. If there were two such points, then the runner would be simultaneously forbidden from, but required to, be able to reach points between. $\endgroup$
    – supercat
    Commented Jul 30, 2021 at 16:39
4
$\begingroup$

So many answers, but not yet pointed out is that your proposed proof of the derivative of $x^n$ with respect to $x$ is wrong because it uses the wrong notation. There is a difference between Big-O and little-o (see the formal definitions of Landau notation for details). $ \def\lfrac#1#2{{\large\frac{#1}{#2}}} $

That said, there is a way to make things rigorous without sacrificing on the intuition. See, suppose you have variables [1] $x,y$ that are varying with respect to $t$ (e.g. time) such that $\lfrac{dx}{dt}$ and $\lfrac{dy}{dt}$ are both defined at any point. We can focus our attention on any specific point, and consider any other point (with a different value for $t$). Let $Δx,Δy,Δt$ be the changes in $x,y,t$ when you move from the first to the second point. By definition, $\lfrac{Δx}{Δt} ≈ \lfrac{dx}{dt}$ as $Δt → 0$, which can be intuitively expressed by saying that, if $Δt$ eventually gets close enough but not equal to $0$, then $\lfrac{Δx}{Δt}$ eventually stays close enough to $\lfrac{dx}{dt}$. Similarly $\lfrac{Δy}{Δt} ≈ \lfrac{dy}{dt}$ as $Δt → 0$.

What is $\lfrac{d(x·y)}{dt}$? $Δ(x·y) = (x+Δx)·(y+Δy)-x·y$ $= x·Δy+Δx·y+Δx·Δy$, so we have $\lfrac{Δ(x·y)}{Δt} = x·\lfrac{Δy}{Δt}+\lfrac{Δx}{Δt}·y+\lfrac{Δx}{Δt}·\lfrac{Δy}{Δt}·Δt$ $≈ x·\lfrac{dy}{dt}+\lfrac{dx}{dt}·y+\lfrac{dx}{dt}·\lfrac{dy}{dt}·Δt$ $≈ x·\lfrac{dy}{dt}+\lfrac{dx}{dt}·y$ as $Δt → 0$. Why can we make these approximations? Because $x,y,\lfrac{dx}{dt},\lfrac{dy}{dt}$ are all finite, and approximate equalities are closed under addition and multiplication by a finite factor, and because $Δt ≈ 0$ as $Δt → 0$. Therefore $\lfrac{d(x·y)}{dt} = x·\lfrac{dy}{dt}+\lfrac{dx}{dt}·y$.

So we have rigorously proven the product rule for derivatives while keeping the algebraic benefits of Leibniz notation. We can easily obtain $\lfrac{d(t^k)}{dt}$ for every $k∈ℕ^+$ using the product rule and induction. Trivially $\lfrac{d(t^1)}{dt} = \lfrac{dt}{dt} = 1 = 1·t^0$. Given any $k∈ℕ^+$ such that $\lfrac{d(t^k)}{dt} = k·t^{k-1}$, we have $\lfrac{d(t^{k+1})}{dt} = \lfrac{d(t^k·t)}{dt}$ $= t^k·\lfrac{dt}{dt}+\lfrac{d(t^k)}{dt}·t$ $= t^k·1+k·t^{k-1}·t = (k+1)·t^k$, by the product rule. Therefore by induction we get the desired theorem.

Again, completely rigorous, and yet without any need to have any unclear bound on many terms.

We can extend this to negative integers and even rational powers. Consider any point at which $t ≠ 0$. Take any $k∈ℕ^+$. Then $\lfrac{d(t^k·t^{-k})}{dt} = \lfrac{d(1)}{dt} = 0$. Also $\lfrac{d(t^k·t^{-k})}{dt} = t^k·\lfrac{d(t^{-k})}{dt}+\lfrac{d(t^k)}{dt}·t^{-k}$ $= t^k·\lfrac{d(t^{-k})}{dt}+k·t^{k-1}·t^{-k} = t^k·\lfrac{d(t^{-k})}{dt}+k·t^{-1}$ by the product rule. Thus we have $\lfrac{d(t^{-k})}{dt} = -t^{-k}·(k·t^{-1}) = -k·t^{-k-1}$, as desired.

I leave it as an exercise for you to extend it to rational powers, which can be done via the same tools. Hint: Take any $p,q∈ℤ$ such that $q > 0$. Consider any point at which $t > 0$. Then prove by induction that $\lfrac{d(t^{k/q})}{dt} = k·t^{(k-1)/q}·\lfrac{d(t^{1/q})}{dt}$ for every $k∈ℕ$. Thus $1 = \lfrac{d(t^{q/q})}{dt} = q·t^{(q-1)/q}·\lfrac{d(t^{1/q})}{dt}$, and so $\lfrac{d(t^{1/q})}{dt} = 1/q·t^{1/q-1}$, yielding $\lfrac{d(t^{k/q})}{dt} = k/q·t^{k/q-1}$ as desired.

This formalization of derivatives can easily extend to implicit differentiation. We simply define $\lfrac{dy}{dx} = r$ iff $r$ is a constant and $Δx → 0$ and $\lfrac{Δy}{Δx} ≈ r$ as $Δt → 0$. Intuitively, $\lfrac{dy}{dx}$ is defined to be $r$ if $r$ is a fixed constant and moving the other point close enough causes the value of $x$ to also move close enough and the ratio of $Δy$ over $Δx$ to stay close enough to $r$.

Now we can easily prove the chain rule $\lfrac{dz}{dx} = \lfrac{dz}{dy}·\lfrac{dy}{dx}$ at any point where $\lfrac{dz}{dy},\lfrac{dy}{dx}$ are both defined. I leave that as an exercise.

Implicit differentiation is of course useful in practical applications of derivatives to many fields, and here is one example.

Incidentally, it is actually mathematically difficult to prove $\lfrac{d(t^r)}{dt} = r·t^{r-1}$ for arbitrary $r∈ℝ$ and $t > 0$, in the sense that the easiest way is probably via the exponential function and natural logarithm and via the definition of $t^r = \exp(r·\ln(t))$. None of these are easy, so I would say that we cannot handle this at the high-school level.

[1] Here "variable" is in the older sense of a symbol used to denote a varying quantity, and not in the modern sense of a symbol in a logical formalism of mathematics.

$\endgroup$
5
  • $\begingroup$ I will read through the rest of your post soon - looking up the page on Landau notation, isn't "f is dominated by g" an extremely similar statement to "f is bounded by g"? I agree in hindsight that using Big-O here is better, but what specifically is wrong with the little-o when I'm talking about vanishingly small things? $\endgroup$
    – FShrike
    Commented Jul 30, 2021 at 18:54
  • $\begingroup$ Proofs involving big $\Delta$ instead of limits strike me as non standard analysis proofs, instead of conventional calculus proofs. I suppose that notation is much more intuitive to the new student! $\endgroup$
    – FShrike
    Commented Jul 30, 2021 at 19:00
  • 2
    $\begingroup$ @FShrike: Because $x^2$ is not $o(x^2)$ (as $x→0$), as you can see from the definition of little-o. Also, I did not use any non-standard analysis, and do not recommend any such thing because the required foundations for NSA is ridiculously complex compared to what I did use, which is very ordinary reasoning about real numbers. In particular, no student who cannot grasp basic real analysis can ever grasp non-principal ultrafilters... $\endgroup$
    – user21820
    Commented Jul 30, 2021 at 19:09
  • $\begingroup$ This is aside from the question now, but what would be a better notation to capture the idea that the remainder terms are a function of successively higher powers of $\delta x^2$ (and therefore successively more vanishing terms)? I could just say the remainder terms are $(\delta x)^2\cdot g(x,\delta x)$ where $g$ is a finite valued polynomial, but that seems clumsy. $\endgroup$
    – FShrike
    Commented Jul 30, 2021 at 19:13
  • 1
    $\begingroup$ @FShrike: You're not going to escape the unconvincing "..." (ellipsis) in your argument, without using induction at some point. That is why the approach I gave is probably the cleanest. In other words, if I am a logical student I would question whether you can really justify that all your little bits didn't add up to too much. How convincing you want to be is of course up to you, but I think might as well just get the product rule first since we want it anyway... $\endgroup$
    – user21820
    Commented Jul 30, 2021 at 19:15
4
$\begingroup$

I do not know your textbooks, but when I first learned about limits in school (I believe in my school system that happened when we were about 15/16 years old, but it has been an awful long time ago, so I'm not sure), the $\delta-\epsilon$ approach was used, with a strong geometric aspect. I cannot remember every having any inclination that there were any divisions by zero or anything along these lines.

I.e., $\lim$ was introduced right from the start like in this picture:

https://en.wikipedia.org/wiki/Limit_(mathematics)#/media/File:L%C3%ADmite_01.svg

The formalism "Whenever a point x is within a distance δ of c, the value f(x) is within a distance ε of L." translates directly to simply drawing lines into that image. Don't start out by drawing a tangent at $(x,f(x))$, but draw a line from $(x-\delta,f(x-\delta)$ to $(x+\delta,f(x+\delta)$. Then pick a smaller $\delta$ and draw more lines. It is intuitively visible that they converge to the tangent.

At this point, you can point out the relationship of the quotient $\epsilon/\delta$ to the inclination/angle of the line.

Obviously you should pick a beneficial example at first, nothing too weird. $f(x)=x^2$, and don't pick $0$ as your $c$.

The point of the exercise is not primarily to show that there is a tangent here. Of course that is useful too for an intuitive understanding in the case that $f$ is a school-level "simple" function. But the main point is that it is obvious to see that even though $\delta$ gets ever smaller, and $\epsilon$ does so too, there is no danger of ever dividing anything by zero.

Take as long as they need to understand this very simple beginner concept. Then pick one of the other aspects, for example, the formal $\delta-\epsilon$ definition. Or you can do a hard switch and talk about series limits, just to drive home the point that "limit" is never something that happens at a particular point (be it at some $c$ where you want to calculate $f'(c)$, or in the case of a limit towards infinity), but that saying "limit" means a short hand for a "moving target" when going towards some value (or going on forever in the case of infinity).

If I recall correctly, we actually did derive the usual common limits in school by plugging things into the formal definition; i.e. the "rules" were not simply written down and then followed by rote. You definitely can try that as well, you do not need uni level education for that.

$\endgroup$
4
$\begingroup$

I am a father of 14 and 17 yo children in France. I have a rudimentary understanding of math through my studies (PhD in physics) and I always used math as a useful toolbox.

When my older kid had derivatives (during COVID, which in France was an educatory disaster), he was disappointed because he could not understand what this was after reading his book.

It was very much understandable: the first phrase is about R that maps to R and then comes a fraction with a lim in front.

To me, this is the worst way to teach something - physics was guilty of the same thing by dropping a formula and good luck with that.

I started with a physical example (velocity, then heat), showing how the derivative is built. The limit was quite easily digested, and then he had a eureka moment when he understood that a derivative is a function factory: a function comes in, and a completely different function comes out, carrying some interesting features of the input one.

All this was purposedly discussed with a lot of hand waving (as we do in physics), sacrifying the precision on the altar of understanding.

He got everything, came back to the initial formula which then was clear, and easily went through the topic.

$\endgroup$
1
  • 2
    $\begingroup$ A function factory is a good coinage, thank you $\endgroup$
    – FShrike
    Commented Jul 29, 2021 at 20:22
4
$\begingroup$

On the derivative paradox.

I think it is essential to raise doubt in the students' minds about what the OP's calls "the derivative paradox," at least in the case of instantaneous velocity, which is what I use to introduce the derivative. I've related before the story of a friend of mine who got a fortune cookie whose fortune read, "Never try to prove what no one doubts." The doubt motivates whatever technical move is needed, whether for a theorem, its proof, or a definition, such as using a limit to not let the denominator reach zero, as Will Orrick observed.

No student fails to grasp the paradox, and part of that paradox is the feeling that there must be such a thing as instantaneous velocity. "Everyone has or believes to have an idea of velocity," said Lagrange in criticizing one of the two common approaches to analysis at the time. Indeed, most textbooks (that I've read) omit the paradox and latch directly onto our intution: "The idea of instantaneous velocity makes intuitive sense, but care is required to define it precisely." (Rogawski/Adams/Franzosa) "We assume from watching the speedometer that the car has a definite velocity at each moment, but how is the 'instantaneous' velocity defined?" (Stewart/Clegg/Saleem) Not only do they assume it exists, they implicitly assume it is the limit of the average velocity: "...[W]e can approximate the desired quantity by computing the average velocity over the brief time interval of a tenth of a second..." (Stewart/Clegg/Saleem). In similar fashion, it is common to assume the slope of the secant line approximates the slope of the tangent line, without any prior definition of tangency. It seems fine to me to make such assumptions. Textbooks, and therefore I assume many teachers, make plenty of them in teaching first-time calculus, just as I do. I have two criticisms. One is what are being assumed, defined, and proved should be made clear. Many of the textbooks are so informal at the beginning (introduction) of the calculus that it is sometimes vague which of the three is happening with regard to instantaneous velocity and the slope of the tangent line. The other criticism is that if the idea of instantaneous velocity makes clear, intuitive sense, I wonder why it took so long to define it precisely. That is a long and interesting investigation, which I won't go into, but the point is that there is a paradox like the OP's which cannot easily be dismissed as being intuitively obvious. As others have pointed out, it is the definition of the derivative that is meant to get the student past the paradox.

This Q&A came to my attention because of the update by the OP that included a sad assessment, "Everything I thought might be a problem ended up as a non-issue because no-one challenged anything." Ever since graduate school, when the younger sister of one my students said hi to me when we were introduced by saying, "Math is bogus!," I have encountered such attitudes. And I have always taken seriously their reactions. I think their attitude toward math contributes to the student's feeling that challenging anything in math is pointless. I teach in the US, and, of course, not every student has the same experience. The OP mentions "A level," which I take to mean they are working within the British educational system or one based on it. I do not know what difference that makes, though. I know that students might leave my class with a math-is-bogus attitude, despite my attempts to show them true, non-bogus mathematics, because they come into the course with preconceived notions and bogus math feels comfortable when they are good at getting right answers.

What I have to share here is a different approach that complements other answers. Certainly I think the derivative paradox is handled adequately by the definition of the limit. However, I think we can do better. A rich understanding of mathematics comprises more than the logical structure created by our choices of axioms and definitions. The connections that motivate our choices are important. As Proclus said, "...mathematics, though beginning with reminders from the outside world, ends with ideas that it has within...." Often what motivates our choices becomes clear only on reflection, after we try out a definition and see why it succeeds. I offer an approach to derivatives motivated by an intuition that what we would like is for the average velocity of a smoothly moving object to vary continuously.


An approach to the derivative paradox

Does anyone have any ideas, experience-backed, for how to take a curious student past the “derivative paradox”?

Perhaps you will like the following approach, which I adapted from Stephen Kuhn (The Derivative à la Carathéodory. Am. Math. Monthly, 1991) after seeing Ray Mayer's notes for Math 112 at Reed College. I chose a slight twist to Kuhn's idea. Kuhn's small observation that the average rate of change between a variable point $x$ and fixed point $c$ can be extended to a continous function at $c$, in the case where $f$ is differentiable at $c$, is key to the kind of model of change we want to create in calculus/analysis. Our experience of the real world leads us to imagine that in continuous change, such as we experience in observing the motion of an object, velocity changes continously when an object moves smoothly. Treat these observations as intuitions, and ask, how can I create a mathematics that models what I imagine I see? Once we create such a mathematics, we can analyze which functions model smooth change and which do not.

Building on our notion of average velocity or rate of change, the change in position over the change in time, there is a fairly clear way to procede, as set out by Carathéodory (Theory of Functions of a Complex Variable, AMS/Chelsea, 2001; orig. 1954; German ed. 1950). We may define differentiability and the derivative as follows:

Given a function $f$ defined in an open interval containing a given number $c$, if we can find a function $f^\Delta$ such that (1) $f^\Delta(x) = [f(x)-f(c)]/(x-c)$ for all $x\ne c$ in the interval and (2) $f^\Delta$ is continuous at $c$, then $f$ is said to be differentiable at $c$ and the derivative of $f$ at $c$ is given by $f'(c)=f^\Delta(c)$.

Example 1 (differentiability). Let $f(x) = x^2$. Fix a number $c$ (any real number). Then $[f(x)-f(c)]/(x-c) = (x^2-c^2)/(x-c)=x+c$. Now, $f^\Delta$ defined by $f^\Delta(x)=x+c$ is continuous at $c$, and it is equal to the average rate of change of $f(x)$ for $x \ne c$. Hence we have found the required function, and therefore $f'(c)=2c$.

Note that the algebraic mechanics of finding $f^\Delta(x)$ is the same as finding the limit in the limit formulation of the derivative. Indeed, it is easy to prove they are equivalent. OTOH, the conceptual goal is to find a continuous model for the average rate of change in a neighborhood of $c$. At no time are we concerned with a limit of the indeterminate form $0/0$.

Discussion. The process works this way for the standard algebraic examples used to introduce the derivative. I think that is sufficient to recommend it. In terms of mechanical algebraic operations, there is not much difference, which may leave you thinking that the difference is negligible. I might add that the "recipe" is to start with the average rate and analyze. For an algebraic function, factor out $x-c$ and reduce to a continuous function. I like that we focus the discussion on the average rate and the question of its continuity instead of on a limit formula. (When working from the limit definition of the derivative, one could structure the limit computation to start with the average rate, but it seems natural to start with the formula given in the definition.) See Example 4 below for a transcendental function example. I do like to give an example of nondifferentiability, not only because it is good practice to show objects on both sides of the boundary drawn by a definition, but because of how it reflects on the goal of finding a continuous model for the average rate of change; see Example 4. First, Example 1 may be extended as follows.

Example 2 (mutatis mutandis). Let $f(x) = x^n$ for an integer $n>2$. Fix a number $c$ (any real number). Then $[f(x)-f(c)]/(x-c) = (x^n-c^n)/(x-c)=x^{n-1}+c\,x^{n-2}+\cdots+c^{n-1}$, a polynomial. Now, $f^\Delta$ defined by this polynomial is continuous at $c$, and it is equal to the average rate of change of $f(x)$ for $x \ne c$. Hence we have found the required function, and therefore $f'(c)=n\,c^{n-1}$, the value of the polynomial at $x=c$.

Example 3 (general rule). Let $f^\Delta$ and $g^\Delta$ be the continuous extensions of the average rates of change of functions $f$ and $g$ respectively, assumed to be differentiable at a number $c$. Then it's easy to show $f^\Delta+g^\Delta$ is the required "Delta" function to show the differentiability of $f+g$ at $c$. Exercise left for the reader.

Example 4 (singularity). Suppose a ball bounces off a wall, such that the ball initially had some constant velocity, say $v$, and after the bounce the velocity was constant but $-v$. The position would be given by $f(t)=-v\,|t-c|$, if we set the time of the collision to be at $t=c$. The average velocity between $t$ and $c$ is $$ \cases{\phantom{-}v & if $t<c$; \cr -v & if $t>c$. \cr} $$ Since the limit as $t \rightarrow c$ does not exist, there can be no function $f^\Delta(t)$ that is continuous at $c$ and equal to the average velocity for $t \ne c$. Thus the position is not a differentiable function at $c$, and we do not have a way to define the derivative at $c$.

Discussion. One might argue that the average of the two velocities, a velocity of $0$, makes sense as a way to define the derivative since the wall is at rest and the ball is in contact with the wall at $t=c$. (Compare with the central difference formula.) But looking to the physics in this case is problematic, since rebounds are quite complicated phenomena that are not instantaneous. To rely on the physics requires a more complicated and reliable model than the absolute value. To refocus our attention, it is the nondifferentiability of the absolute value that is our interest. We may like to imagine that velocity is an inherent property of an object, like its color, and present at each instant. As far as our model of an instantaneous bounce goes, it's probably best to say the instantaneous velocity at $t=c$ is undefined. Note that this conclusion is about our model and not about the physics of bouncing; however, the nondifferentiability of absolute value is a consequence of our choice to base differentiability on extending the average rate of change to a continuous function.

Viewing differentiability as the continuity of the average rate adds some interest to the standard pathological example $g(x) = x^2 \sin(1/x)$ for $x \ne 0$, $g(0)=0$. For any $c$, there is a continuous $g^\Delta(x)$ that equals $[g(x)-g(c)]/(x-c)$ for $x \ne c$ — continuous, as it turns out, for all $x$. So $g'(c)$ is defined for all $c$. But despite the continuity of $g^\Delta(x)$ for each $c$, $g'$ is not itself continuous at $0$. Later one can spiral back to take up the continuity of $g^\Delta$ as a function of $x$ and $c$ in multivariable calculus.

Remark on $\sin x$ and friends.

Once you get past the introduction of the idea of a derivative, one can simply use the limit formulation of the derivative to find the value for $f'(c)$ at the missing point $x=c$ that makes the average rate continuous at $c$ (or show no such value exists). But if you want to continue with the approach in example 1, the transcendental functions offer challenges that are familiar when using the limit formula. The two processes never need be very different, after all.

Example 5. So for $\sin x$, one can show geometrically that if $x$, $c$ lie in the same quadrant, $\sin x - \sin c$ lies between $\cos x\cdot (x-c)$ and $\cos c \cdot (x-c)$. One can use the squeeze theorem then to show $\cos c$ extends $[\sin x - \sin c]/(x-c)$ to a continuous function at $c$.

$\endgroup$
3
  • 1
    $\begingroup$ This is an extremely thoughtful answer, probably one of the best so far. Thank you. I will probably take up tutoring again in the foreseeable future and will try to bear this in mind. And yeah, I’m in the UK school system, finishing in a month! $\endgroup$
    – FShrike
    Commented May 20, 2023 at 7:42
  • $\begingroup$ Do you have a reference showing how the derivative a la Caratheodory can be used with several variables? $\endgroup$ Commented Jun 12, 2023 at 19:34
  • 1
    $\begingroup$ @MichałMiśkiewicz There's Acosta/Delgado "Fréchet vs. Carathéodory," Am. Math. Monthy (1994) jstor.org/stable/2975625, which is a response to Kuhn's article. $\endgroup$
    – user1815
    Commented Jun 12, 2023 at 20:42
2
$\begingroup$

Hopefully, they know some basic physics, e. g., that if you release a heavy object, its velocity grows uniformly, and after $t$ seconds it will have velocity $gt$. Now, ask them, if they had to prove this experimentally, how would they go about it? They probably will come up with some experimental design involving stopwatch, then you point out that they are in fact measuring average velocities and not momentary velocities. Then you may discuss how you get better and better validation of the uniform acceleration law by taking finer instruments and smaller intervals between measurements, but ultimately there's some kind of "right answer" that you are approaching with these improvements.

The obvious advantage here is that they have already an internalized notion of momentary velocity and have no intuitive doubts that such a thing exists.

$\endgroup$
1
  • $\begingroup$ Yes, I’ve made the analogy with physics before when people have asked me “what even is calculus?” And I make some hand waving at “well, if I drop this pen, it’s moving smoothly, over instants, isn’t it? And if a physicist wants to know about smooth motion instead of discrete motion that’s what’s calculus is for” $\endgroup$
    – FShrike
    Commented Jul 30, 2021 at 13:14
2
$\begingroup$

my question is about how to specifically soothe a curious student quibbling about the derivative “paradox” in the same way that I once (detrimentally, with no one to guide me) did. I’m not interested in general ideas of whether we should or shouldn’t push rigour into introductory calculus.

Do not soothe them. Encourage them. Cheer them on as they fight this fight.

No one will challenge your own understanding of a subject like a curious student. But rather than fear this challenge embrace it because if you listen well you're about to discover whole different ways to think about this. This doesn't mean you're way of thinking about it is wrong. It's simply not the only correct way.

What the student is searching for is a way to think about this that they can fit into their own way of thinking about math. Now sure, they need to think about it correctly but they need a correct way that works for them. The problem is, you likely have no idea what that is. But that's fine. Just listen to them. When you spot misconceptions correct them gently. Maybe with your own penetrating questions, thought experiments, and stories. Be prepared to be wrong. Take it with grace. Don't dictate how they think. Judge their methods by the results. Hinting at your way of thinking may help but don't insist on it. Make it fun.

As an example let me tell you I've successfully tutored a Summa Cum Laude student in mathematics. I've tutored, in a university mathlab, calculus, linear algebra, differential equations, and statistics. And I still can't do times table flash cards.

Yeah my brain just doesn't work that way. I've found ways around that problem. I know a lot of patterns that let me work out the answer but some cards still leave me stammering.

Yet after working in the lab I can factor fairly large numbers in my head.

So yeah, I'm weird. And it makes it hard to teach me. Because I come at the typical problems in very different ways. And some teachers and tutors can't handle that.

But what I've learned over the years is that I'm not that weird. Many people have their own styles that play to their strengths rather than their weaknesses.

Because of that my advice is to listen to them. Learn what you can about how they think and how they model this problem in their heads. Test what they can relate to and use that.

“but aren’t we just dividing by zero?”

This is a voice calling out from many previous lessons about this being undefined. You are about to change how they look at the whole universe so don't trivialize this question. Take them seriously.

"Yes we are. Kinda. But we're telling the story of how you get to it. Without that story it's still undefined. We're trying to keep zero from destroying that information."

What that answer is attempting to do is let them know that their previous lessons weren't useless lies. They were lies leading to a deeper more useful lie because that's what all models are. If they weren't they'd be the real thing.

That answer might not land with them. But it is trying to speak to them at the same level. They want a simple explanation of how this fits with what they learned before and what's changed since then. Find some way to explain that to them that will stand up to the lessons yet to come.

$\endgroup$
1
  • 2
    $\begingroup$ Many good sentiments here; I wholeheartedly agree. Good answer $\endgroup$
    – FShrike
    Commented Jul 30, 2021 at 19:53
2
$\begingroup$

I'm not sure how I missed this question when it was first posted, and it already has many answers, but I feel there are some important things that still need to be said to address your specific concerns.

“but aren’t we just dividing by zero? ... why are we doing maths where there are zeros on both the numerator, and the denominator, that's undefined,... right?” I can't claim to be giving an "experience based" answer since, in my years of teaching calculus I can't recall ever having been asked that question. Maybe it doesn't actually bother most students, or perhaps my teaching approach headed this off at an early stage, or perhaps students found it difficult to approach me---my classes were extremely large. For whatever reason, it never came up. I do understand from my own learning of calculus why it bothered you though, and, if the question does arise, I think it needs to be addressed in an honest and direct way.

What I would emphasize is that the processes of taking limits and derivatives are defined so that this is explicitly ruled out. You don't have to give the formal definition of limit at first---an intuitive approach is fine---but you do need to tell them that when they see the formal definition of $\lim_{h\to0}$ later, they will find that $h=0$ is expressly forbidden. Specifically---and you may not want to mention these details at first---the definition makes use of a positive (not 0) number $\delta$ such that $h$ lies either in the interval $(-\delta,0)$ or the interval $(0,\delta)$. In other words, the whole set-up is designed so that you are protected from $h$ ever being $0$. To keep things simple, you can say that when taking the limit the value of $h$ is considered small, but not $0$, and emphasize the "not $0$" part repeatedly.

Contrast this with algebra, where in dealing with $\frac{x+2}{x}$ or $\frac{x^2+x}{x}$ and no domain explicitly mentioned you have to take care about what would happen if $x$ were $0$. In the definition of derivative you don't have to worry because the $h$ in $\lim_{h\to0}$ is defined not to be $0$. That means you are free to do cancelations of factors of $h$ that would require much more care in an algebra class.

Limits are used in the definition of the derivative so that you can talk about what happens as $h$ comes arbitrarily close to $0$, without equalling 0. Velocity is a key application of the derivative, and I ask students to think about how they would estimate the instantaneous velocity of a moving object given a series of time-stamped still photos of the object at closely spaced points in its motion. I point out that it is impossible to make the estimate using only a single photo---you have to have two photos taken at slightly different times and look at how far the object moved in between. This observation provides a physical reason why we impose the condition $h\ne0$. Of course, any computation made based on photos spaced apart by any finite time interval can only be an approximation. Taking the limit can be thought of as finding the value these approximations approach in the process of shrinking the interval. (If such a value exists.)

"why is $(\delta x)^2$ "smaller" than $\delta x$? $0$ is just $0$, right?" This, of course, relates to the other question, and the answer is that $\delta x$, which I have been calling $h$, is never $0$. Your proof, which is perfectly correct (apart from the little-o versus big-O issue), makes no mention of $(\delta x)^2$ being less than $\delta x$, and there's no reason to talk about that. I think what you are bothered by is the instruction to "neglect lower order terms" that one commonly sees in calculus and physics books. My advice is simply not to use that language. Students will pick up through experience which terms are going to go to zero when the limit is taken, and can start neglecting such terms on their own once they have sufficient practice. As beginners, however, they should write out all the terms and compute which terms go to $0$ and which don't.

I should add that you are right that $h^2$ is not smaller than $h$ when $h=0$. It's also not smaller when $h\ge1$. What is true is that $h^2<\lvert h\rvert$ for $h\in(-1,0)\cup(0,1)$. As mentioned above, in the $\epsilon$-$\delta$ definition of $\lim_{h\to0}$ a positive number $\delta$ is used to bound $h$: $h\in(-\delta,0)\cup(0,\delta)$. The definition doesn't explicitly require that $\delta$ be less than $1$ and for large $\epsilon$ it may not need to be in which case $h^2<\lvert h\rvert$ wouldn't always hold, but as $\epsilon$ is made smaller, $\delta<1$ ends up having to be true. The point here is that, yes, eventually $h^2$ is less than $\lvert h\rvert$, but that fact plays no explicit role in your proof and students don't need to worry about it at first.

Now eventually students will have to get past the $\epsilon$-$\delta$ hurdle. Some other answerers have said that students simply can't hope to understand what's going on until they've jumped over that hurdle, but in my opinion that's not necessary, and forcing it will place an unnecessary barrier to understanding in front of many students. The $\epsilon$-$\delta$ definition is difficult because it seems somewhat cumbersome, and students don't grasp at first why it needs to be so cumbersome. Informally, the meaning of $\lim_{x\to c}f(x)=L$ is often stated as "$f(x)$ gets closer and closer to $L$ as $x$ gets closer and closer to $c$." For the power functions $x^n$ that you have in mind this is perfectly adequate: $\frac{(x+h)^2-x^2}{h}$ really does get closer and closer to $2x$ as $h$ gets closer and closer to $0$. It's for limits like $\lim_{h\to0}h\sin(1/h)=0$ that you have to take greater care about how you state things. In this example the function actually gets all the way to the limiting value $0$ infinitely many times as $h\to0$ from the positive side, but each time it moves away from that value again before turning around and heading back toward it. In this kind of situation, the "gets closer and closer" language it too imprecise and, in fact, incorrect if interpreted literally. So here you really need the idea that eventually the function gets inside an $\epsilon$-sized envelope surrounding the limiting value, and stays there. I think it is good for students to see examples like this so they know why the "closer and closer" language is inadequate and why a more precise definition is needed. But again, these issues are separate from your concerns and don't need to be overcome before the questions you raise can be answered.

$\endgroup$
1
  • 1
    $\begingroup$ This is all good advice, thank you. Unfortunately I was never allowed to tutor due to an age restriction, but hopefully I’ll have a chance to make use of it next year $\endgroup$
    – FShrike
    Commented May 27, 2022 at 15:01
2
$\begingroup$

After seeing your edit I'd like to say something I thought of writing when I fist saw this but didn't: maybe it's not that the students don't care, but rather that they never ran into the same problem you did. I know I didn't have the same problem when I learned calculus without epsilons. There is no "derivative paradox" that can only be cleared up with the epsilon-delta definition of limit, and I never thought I was dividing by zero. It's possible to understand the essential idea of limits without a rigorous definition. I'd even argue that you have it all backwards; you can't make sense of the definition of limits unless you already understand them!

$\endgroup$
5
  • $\begingroup$ Maybe the edit was poorly phrased. I wanted to convey the following: no one in school questions anything, at all (except for the rarities who would study pure maths seriously in the future). $\endgroup$
    – FShrike
    Commented May 10, 2023 at 19:37
  • 1
    $\begingroup$ @FShrike your observation doesn't surprise those who taught math for a while. Think about learning other skills. People who learn to ride a bike or drive a car are rarely concerned with understanding what makes it work; their goal is to become a practical user. (How bike's work has some surprises: see youtube.com/watch?v=9cNmUNHSBac.) Someone not concerned with pure math can learn the point of derivatives just with pictures, so what once bothered you would never bother them. If you teach math some day, remember that explanations that are right for you may not be right for others. $\endgroup$
    – KCd
    Commented May 20, 2023 at 22:07
  • $\begingroup$ @KCd I forgot to respond at the time, but this was a very good point. I might say that I would at least acknowledge ignorance of how bikes work, and that I would acknowledge there is more to it than meets the eye - I certainly could not build a bike. But I also felt frustrated that no one (not even any teachers) wanted to acknowledge there was more to the school calculus than merely 'shut up and calculate'. $\endgroup$
    – FShrike
    Commented Jul 30, 2023 at 11:13
  • $\begingroup$ @FShrike All technical subjects have deeper levels, but students who are uninterested in the subject typically don't want to go deeper. Teachers who have taught their subject matter only at a high school level for many years may not remember the deeper levels anymore (or they do remember it was tough going the 1st time and never understood it). In chemistry, the ultimate explanation of atomic behavior needs quantum mechanics, but someone who taught HS chemistry for 25 years is unlikely to be up for discussing quantum mechanics in a technical way with a student who asks "why" about everything. $\endgroup$
    – KCd
    Commented Jul 30, 2023 at 18:35
  • 1
    $\begingroup$ @FShrike At the university level, the faculty in technical subjects who do research and advise graduate students are going to know all the technicalities of the basic undergrad courses and would be open to talk with you about how deep the material can do. $\endgroup$
    – KCd
    Commented Jul 30, 2023 at 18:38
1
$\begingroup$

Personally I found the calculus is done in two parallel tracks: 1) on how essentially the derivatives and integrals are, and 2) the delta-epsilon language to be rigorous. Sometimes there needs a "translation" in the mind when one tries to understand or compose a proof.

I would like introduce "Calculus made easy" by Silvanus P. Thompson, even better the version with Martin Gardener's notes, it established the essential ideas of calculus with the original Newton's languages.

For "divided by 0" paradox, I would recommend https://math.stackexchange.com/questions/12906/the-staircase-paradox-or-why-pi-ne4 as an example to show that what matters here is the ratio of two infinitesimal amounts

$\endgroup$
1
$\begingroup$
  1. There is a reason why the binomial theorem and calculus were the subject of study by Newton---they are closely related!

  2. There are computational methods that work consistently and then there are proofs that these methods are "correct". This represents two different approaches to mathematics. Many working mathematicians were (in their youth) enchanted by proofs and so that has become the dominant approach. It does not have to be so!

  3. A lot of early mathematics is more about computational methods that are consistent. One does not fully justify them with proofs. Practical and heuristic examples are used instead. Counter-examples are used to limit their applicability.

  4. For example, we say that multiplication is repeated addition. To what extent is this proved in school? Proving basic arithmetical laws for integers requires induction, let alone for rational numbers and don't even bring up decimal or real numbers. We talk about angles and distances being measured even though proving that such measurements have the right properties requires limits; in the case of angles it is even more tricky than for distances since they can only be bisected. So, from the point of view of "proof" a lot is missing in school mathematics. Why are we balking at calculus?!

Coming to your question, I believe it is reasonable to introduce calculus via an algebraic method like: (1) $a(x+h)-ax = ah$ to justify the linear case followed by (2) $(x+h)(y+g)=xy+(xg+yh)+\dots$ as an introduction to Leibniz rule. When $g,h<1$ then $gh<\min(g,h)$. This can be used to explain why we are uninterested in the remaining terms. One can use similar ideas to justify the integral $\int_a^b x dx = (b^2-a^2)/2$ and use this heuristic to justify its extension to $x^n$ for $n>1$.

TL;DR: The Leibniz rule is "king" as it is a useful computational method. Justifying it on the basis of $\epsilon$-$\delta$ proofs is an afterthought.

$\endgroup$
0
$\begingroup$

What you have described is an indeterminant form. Examples include

  • infinity - infinity
  • 0/0
  • 0^0
  • 1^infinity
  • infinity/infinity

What this means is upon substitution of limit value you will not be to evaluate the limit.

To give an example

dy/dx = limit dx->0 (f(x+dx) - f(x))/dx

Substituting dx = 0 in the right handside we will get

(f(x+0) - f(x)) / (0) = (f(x)-f(x))/0 = 0/0

So performing differentiation by first principles we need to deal with indeterminant forms.

The question you have asked is how do we resolve dividing by zero. The answer is we are not. What we are doing performing limit operations. We can various informal definition for limits such the standard limit definition, infinite limits, limits at infinity. In other words it is not required to understand the epsilon delta definition limits.

Just add a minor note. I often students ask why 0/0 = k. For some reason they have confused indeterminant forms with arithmetic.

Now question is how do deal with indeterminant forms. To deal with these forms often some simplification needs be done or proof a theorem. So in the case of the power rule. When proving you you should cancel out the dx in the numerator/denominator.

This would have left you with

limit dx->0 (nx^(n-1) + (dx)*(nC2)*x^n-2 + ...)

All the terms with dx will become zero.

You have asked why dx^2 is smaller than dx. If we think graph x and x^2. when x=0.5, x=0.25. To summarise a x^2 is approach zero faster than x. This is what makes limit x->0 (x^2)/x = 0. The other way to think about it is the simplification of fractions which we apply earlier in the power rule. In this case dx^2/dx = dx.

So limit dx->0 dx^2 /dx = limit dx->0 dx = 0.

To summarise there is no paradox however the content can be a bit challenging to understand.

$\endgroup$
-2
$\begingroup$

As an intuitive aid to the calculus I like to explain that infinity is a number so big that it wouldn't make any difference if the you made it bigger. This is really an epsilon-delta argument in a disguise that makes it look friendlier. It can adapted to any limit situation.

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.