91
$\begingroup$

Honestly, I don't get it. People say it's because it's a postulate. But, I mean, I see people deriving the Schrödinger equation with the help of the wave function, $T+U$ and partial differentials in three space coordinates and one time coordinate. How is that not a derivation? And why can't the Schrödinger equation be derived?

$\endgroup$
2
  • $\begingroup$ I thought you could derive an exact equation, just not for systems with more than two particles. $\endgroup$
    – matt_black
    Commented Jan 16, 2020 at 18:01
  • $\begingroup$ Related: physics.stackexchange.com/q/142169/55751 $\endgroup$
    – Ellie
    Commented Jan 18, 2020 at 15:48

10 Answers 10

206
$\begingroup$

A derivation means a series of logical steps that starts with some assumptions, and ends up at the result you want. Just about anything can be "derived", as long as you vary what the assumptions are. So when people say "X can't be derived", they mean "at your current level of understanding, there's no way to derive X that sheds more light on why X is true, over just assuming it is".

For example, can you "derive" that momentum is $p = mv$? There are several possible answers.

  • You ask this as a student in introductory physics. Some might say yes. For example, you can start from the kinetic energy $K = mv^2/2$, and then assume $K = p^2/2m$. Combining these equations and solving for $p$ gives $p = mv$, so this is a derivation.
  • You ask this as a student in introductory physics. Some might say no. The above derivation is just nonsense. Starting from $K = p^2/2m$ is basically the same thing as assuming the final result, and if you're allowed to do that, it's no better than just taking $p = mv$ by definition. It's like "deriving" $1 + 1 = 2$ by defining $2$ to be $1 + 1$.
  • You ask this as a student in advanced mechanics. Most would say yes. You start from the deeper idea that symmetries are related to conserved quantities, along with the definition that momentum should be the conserved quantity associated with translational symmetry. Putting these together gives the result.

The point is, you can make up a derivation for anything -- but you might not be at a stage in your education where such a derivation is useful at all. If the derivation only works by making up ad-hoc assumptions that are basically as unmotivated as what you're trying to prove, then it doesn't aid understanding. Some people feel this is true for the Schrodinger equation, though I personally think its elementary derivations are quite useful. (The classic one is explained in a later answer here.)


There is often confusion here because derivations in physics work very differently than proofs in mathematics.

For example, in physics, you can often run derivations in both directions: you can use X to derive Y, and also Y to derive X. That isn't circular reasoning, because the real support for X (or Y) isn't that it can be derived from Y (or X), but that it is supported by some experimental data D. This two-way derivation then tells you that if you have data D supporting X (or Y), then it also supports Y (or X).

Once you finish putting high school math on a rigorous foundation, undergraduate math generally builds upward. For example, you can't use Stokes' theorem to prove the fundamental theorem of calculus, even though it technically subsumes it as a special case, because its proof depends on the fundamental theorem of calculus in the first place. In other words, as long as your classes are being rigorous at all, it would be very strange to hear "we can't derive this important result now, but we'll derive it next year" -- that would be in danger of logical circularity.

This isn't the case in physics: undergraduate physics generally builds downward. Every year, you learn a new theory that subsumes everything you previously learned as a special case, which is completely logically independent of those earlier theories. You don't actually need any results from classical mechanics to completely define quantum mechanics: it is a new layer constructed below classical mechanics rather than above it. That's why definitions now can turn into derived things later, once you learn the lower level. And it means that in practice, physicists have to guess the lower level given only access to the higher level; that's the fundamental reason why science is hard!

$\endgroup$
16
  • 13
    $\begingroup$ @Cell Certainly. The point is that as you go deeper in the physics, the deeper layers are logically independent of the shallower ones (since you can use them to derive the shallower ones), but they're not conceptually independent (since they're not intuitive if you don't understand the shallower layers first). $\endgroup$
    – knzhou
    Commented Jan 16, 2020 at 3:38
  • 5
    $\begingroup$ For one thing, in glosses over the changing roles of definitions. I think that most intro students would actually say that you can't derive that momentum is $mv$ because that's a definition, not a postulate. Then later, a "better" (i.e. more general or "elegant") definition of momentum gets introduced. But the layering is not always unique - even among equal experts, you'll often have two equivalent statements, and the experts will disagree which is best thought of as the definition and which is best thought of as the corollary to that definition. $\endgroup$
    – tparker
    Commented Jan 16, 2020 at 4:08
  • 3
    $\begingroup$ In fact, in my opinion the biggest conceptual shift from high-school/pre-med physics to "real" physics is that the former just focuses on conveying true facts, while the latter is also concerned with the logical relations between those facts - clearly distinguishing between definitions, postulates, and theorems. That's why a first mechanics course rarely clearly answers the question physics.stackexchange.com/questions/70186/…. $\endgroup$
    – tparker
    Commented Jan 16, 2020 at 4:17
  • 7
    $\begingroup$ @Cell having prior knowledge of something is not the same as using it as a formal axiom in a logical derivation. $\endgroup$
    – OrangeDog
    Commented Jan 16, 2020 at 13:10
  • 5
    $\begingroup$ @Cell all of which has nothing to do with what a "derivation" is. $\endgroup$
    – OrangeDog
    Commented Jan 16, 2020 at 14:07
43
$\begingroup$

A bit of a different perspective than other answers:

I was once in a strange physics class as an undergraduate, where an old 90 year old professor would mumble to himself while drawing terribly on a tablet connected to a projector. Everyone would get A's by default so no one would pay attention, in fact some days I would be the only one to show up, but this was "Modern Physics", and I wanted to be a physicist so I paid attention, trying to learn whatever I could.

One thing I'll never forget:

the old professor said that everyone says that Schrodinger's Equation is an axiom, but you actually can derive it!

If you imagine yourself in the shoes of Schrodinger. Experiments are showing up that things with matter have wavelike properties. Are there equations of motion that describe "wavelike behavior"? We know how some waves operate in classical mechanics. Now typically in classical E&M, we throw out the imaginary part of $e^{i k - \omega t}$ to work with $\cos(\omega t)$, but what happens if you simply keep the imaginary part of the plane wave?

If you start off with a plane wave:

$$\Psi = e^{i (k z - \omega t)}$$ and you find its derivative $$\frac{d\Psi}{dt} = -i \omega e^{i (k z + \omega t)}$$

if you use the Einstein's idea that energy is quantized into packets of energy (that is that E = h f $\implies f = E/h \implies \omega = E/\hbar $) this becomes:

$$\frac{d\Psi}{dt} = -i \frac{E}{\hbar} e^{i (k z - \omega t)}$$

this immediately becomes

$$i \hbar \frac{d\Psi}{dt} = E \Psi$$

and since the Hamiltonian represents the total energy operator, we can make this:

$$i \hbar \frac{d\Psi}{dt} = H \Psi$$

Which is exactly the Schrodinger's equation!

Now this contradicts what even Feyman says: "Where did we get that (equation) from? Nowhere. It is not possible to derive it from anything you know. It came out of the mind of Schrödinger."

I was curious after class and I asked him some questions about this. No matter what isn't there always needs to be an axiom! He responded saying that yes, there needs to be a starting point, but this is how he imagines Schrodinger came up with it, since this is a very simple and a natural way of obtaining it using knowledge at the time.

To me what's remarkable about this "derivation", is that you only need to start with two things:

  1. The state your observing has the form of a plane wave: $\Psi = e^{i (k z - \omega t)}$
  2. And that energy is quantized in packets: $ E = h f$

And that's it! You don't even need the de Broglie's hypothesis!


EDIT: Some people are curious why the Hamiltonian for the Schrodinger equation has such a strange form: $$H = \nabla^2/2 + V(x)$$ This is also very simple, you just need to plug in the definition of the momentum operator into the equation for the Hamiltonian (which classically is just kinetic energy + potential energy)

$$H = \frac{p^2}{2m} + V(x)$$

$$p = -i \hbar \frac{\partial}{\partial x}$$

$$H = -\frac{\nabla^2}{2m} + V(x)$$

It's that simple!

Now if you are also curious where $p = -i \hbar \frac{\partial}{\partial x}$ comes from, this is also simple. For classical waves, the value "k" is considered to be the momentum. So if we do what we did before, but now find the derivative with respect to position instead of time:

$$\frac{d\Psi}{dz} = i \frac{p}{\hbar} e^{i (k z - \omega t)}$$

$$\frac{d\Psi}{dz} = i \frac{p}{\hbar} \Psi$$ $$-i\frac{d\Psi}{dz} = \frac{p}{\hbar} \Psi$$

$$p \Psi = (-i\hbar\frac{d}{dz}) \Psi $$

This suggests that any time you use $p \Psi$ you can swap it out with $(-i\hbar\frac{d}{dz}) \Psi$, and this is why people say "The momentum operator is $(-i\hbar\frac{d}{dz}) $ in the position basis."

$\endgroup$
7
  • 1
    $\begingroup$ This is exactly how Schiff starts out. $\endgroup$ Commented Jan 17, 2020 at 0:10
  • $\begingroup$ But then the question is why the Hamilton operator represents the energy in a system and why for a particle in a potential (which is the original Schrödinger equation) it should take the rather strange form $\frac{\Delta^2}{2m}-V(x)I$. $\endgroup$ Commented Jan 17, 2020 at 10:58
  • $\begingroup$ Actually Feynman spends a significant amount of Volume 3 on making the Schrödinger equation 'plausible' (for me it is a derivation) $\endgroup$
    – lalala
    Commented Jan 17, 2020 at 11:59
  • 3
    $\begingroup$ You said: "You don't even need the de Broglie's hypothesis!" Then you write, the momentum operator is $p = -i \hbar \frac{\partial}{\partial x}$. But postulating this definition of momentum operator actually is postulating de Broglie's hypothesis $p=\hbar k$, because $-i\frac{\partial}{\partial x}$ is nothing but the wave number $k$. $\endgroup$ Commented Dec 17, 2020 at 12:55
  • $\begingroup$ You don't need the de Broglie's hypothesis to get $i \hbar \frac{d \psi}{dt} = H \psi$. I edited it to make that step more clear $\endgroup$ Commented Dec 17, 2020 at 21:19
41
$\begingroup$

Although knzhou's answer makes a good point stressing the possibility that what is taken as a starting point at the introductory level could become a consequence of a more fundamental principle, I think that there is a key point that should be stressed more clearly.

In physics, whatever conceptual tool we develop has to be rooted in, and its motivation comes from the need to describe and predict what happens in the real world.

Every theory we have, is not just an equation but it is based on some definitions (always conventional; definitions can be useful or not, but never true or false), on some formal apparatus, and on a set of principles which are a convenient way to summarize a lot of experimental activity.

An equation like $\vec F = m \vec a$, within classical mechanics, can be taken as a principle (Newton), or it could be "derived" from a more geometric point of view by referring to groups of transformations on symplectic manifolds. But the important thing that shouldn't be forgotten is that it is an equation within a theory describing the dynamical behavior of macroscopic bodies under a certain set of conditions.

Beyond the range of applicability of classical mechanics, some new physics enters the game. New physics means that some experimental findings are not described anymore by Newton's equations (independently if assumed as principles or derived within a more general approach), and one has to find a new theory.

This change from a theory (or better from a set of equivalent theories) to another set is the irreducible step that justifies the statement that Schrödinger's equation cannot be derived. To be more precise, Schrödinger's equation can be derived, if one assumes as a starting point an equivalent equation. But it cannot be derived from starting points that are not consistent with quantum mechanics. For example, there is no way to deduce Schrödinger's equation from classical mechanics. The best one can do is to recast classical mechanics in the form the closest to quantum mechanics. Still, at some point, a key conceptual difference justified by experiments has to appear. Without that, Physics would be a branch of Mathematics.

$\endgroup$
9
$\begingroup$

Equations aren't 'derived' in a fully rigorous way in physics as the derivation always uses physics in some or all of its key steps. Also physicists have access to tools which mathematicians do not have access to because they do not require full rigour in their derivation: Feyman path integrals are a prime example.

As an example, in the derivation of the Klein-Gordon equation a key step is to take the square root and then only keep the positive root even though the square root function is multi-valued, but this is physically reasonable as the negative square root would represent a negative energy solution. This is why I am not really sure how I feel about attempts to take physical theories and reduce them to a fully axiomatic form, as that might not always be possible or even useful.

The question also depends on what you mean by a derivation. The derivation of the Einstein equations from the differential Bianchi identity involves some key physical assumptions and so is presumably not a 'real' derivation in your eyes, but those same equations can be derived by taking a variation of the Einstein-Hilbert action and you could argue that this derivation is legitimate as it relies on standard calculus of variations. This type of derivation is key in modern theoretical physics and traces back to Noether (maybe the most important concept in theoretical physics).

$\endgroup$
7
$\begingroup$

Start with the classical nonrelativistic energy expression. Make the De Broglie assumption that matter, not just light, can be described by waves. As a consequence identify E with $\frac{\hbar} {i} \partial_t $ and similar for P. There you have the Schrödinger equation.

$\endgroup$
7
$\begingroup$

You could look at Schrödinger's original paper where he introduces the equation. It's actually very nicely written.

E.Schrödinger, An Undulatory Theory of the Mechanics of Atoms and Molecules, Physical Review (1926) Vol. 28, No. 6 pp. 1049-1070

As people have pointed out, you need to make some assumptions to derive the equation. Schrödinger's approach was to say: in optics we can model light by waves (wave optics) or by light rays (geometric optics); geometric optics can be obtained as a short-wavelength approximation of the underlying wave theory. Hamilton's formulation of geometric optics is actually very similar to his later formulation of classical mechanics, so Schrödinger was looking for an underlying (dispersive) wave theory that would produce classical mechanics as the short wavelength limit.

In Hamiltonian/Lagrangian mechanics, there is a quantity called the principal action W: fix a basepoint x, then for any y, W(y) is the integral of the Lagrangian along an action-minimising trajectory from x to y. This function satisfies the Hamilton-Jacobi equation $\partial W/\partial t = -H$. If your system is autonomous (H is independent of t) then you get $\partial^2 W/\partial t^2=0$ so $W=-Ct+S(x,y,z)$ for some constant C and some function S.

In wave optics, waves satisfy the wave equation (possibly dispersive). To get to geometric optics, you end up looking at waves $e^{iW}$, where W is the "eikonal", a function in geometric optics that plays the same role as the principal action in Hamiltonian mechanics. So Schrödinger guessed that the wave equation of quantum mechanics should be the dispersive wave equation with the dispersion relation chosen to ensure that $e^{iW}$ is a solution, where W is the principal action. The identification of the constant C with $E/\hbar$ is then made for consistency with Einstein/Planck/de Broglie.

I wrote a more detailed blog post about this back in 2012:

http://jde27.uk/blog/why-schrodinger.html

but I recommend reading Schrödinger's paper instead!

$\endgroup$
2
  • $\begingroup$ I think it'll take a lot of time for me to understand your blog post, but it's very interesting to see how Schrodinger came up with it. I couldn't follow everything with a simple read through, but it's looks pretty interesting. Thanks for the contribution. $\endgroup$ Commented Jan 20, 2020 at 6:16
  • 1
    $\begingroup$ Thanks for your blog. It's eminently illuminating. I was looking for such an explanation for years. $\endgroup$
    – Amey Joshi
    Commented Jan 3, 2021 at 12:19
4
$\begingroup$

Partial differential equations are derived from basic principles of physics, such as conservation of energy or quantization of energy. They are not axioms. I prefer starting with the Hamiltonian and the principle of least action via calculus of variations, which is indeed axiomatic.

PDE have an infinite number of solutions. The physically reasonable ones are picked out by assuming boundary and initial conditions. For the Schrodinger equation, these are usually assumptions about the far-field behavior, and these assumptions have to be physically reasonable, that is, they can't violate what we know from experiments.

Are physical assumptions axioms? This is the key difference between pure and applied mathematics or mathematical physics - the latter recognizes that physical assumptions can't be ignored and are in a sense axioms. I would argue that physical assumptions can be used in proofs as axioms and do not compromise rigor. Pure mathematicians would likely disagree with me.

The plane wave derivation above is an assumption about the far-field behavior of solutions of the Schrodinger equation. And given the assertion "...since this is a very simple and a natural way of obtaining it using knowledge at the time", the professor's assertion answers Feyman's objection quite naturally. Plane waves were very well known in Schrodinger's time from the well-studied wave equation. The key is that Schrodinger realized that the equation described phenomena other than plane waves. It neatly answers the "where did the equation come from".

$\endgroup$
3
$\begingroup$

Suppose you concluded after seeing the double slit experiment that the position of a particle is in a (linear) superposition of all positions:

$$|{\psi}\rangle = \sum_i \psi_i |x_i\rangle \xrightarrow[\text{cont. limit}]{} \int \mathrm{dx}\ \psi(x) |x\rangle$$

such that the absolute square of $\psi(x)$ gives the probability distribution of finding the particle at $x$ (the Born rule): $$\rho(x) \equiv \psi^*(x)\psi(x) = |\psi(x)|^2$$

Indeed, if the coefficients $\psi(x)$ are complex, you get interference terms that are in agreement with experiment ($|\psi(x) + \phi(x)|^2 = |\psi(x)|^2 + |\phi(x)|^2 + 2 \Re{(\psi^*(x)\phi(x)})$). Probability distributions need to be normalised, which means the state vectors need to be normalised: $$\quad \||\psi\rangle\| = \langle\psi|\psi\rangle = \int \mathrm{dx}\ \mathrm{dx'}\ \psi^*(x)\psi(x') \underbrace{\langle x|x'\rangle}_{\delta_{x',x}} = \int \mathrm{dx}\ \rho(x) = 1$$

To define the dynamics, there's an operation that evolves the system in time: $$|\psi\rangle(t_1) \xrightarrow{U} |\psi\rangle(t_2)$$ Since states are now vectors, this operation must respect the vector space structure; i.e. $U$ must be a linear operator. Furthermore, it must respect that $\rho(x)$ is a probability distribution: $$|\psi\rangle(t_2) = U|\psi\rangle(t_1), \quad \langle\psi|U^*U|\psi\rangle \overset{!}{=} \langle\psi|\psi\rangle = 1 \iff U^*U = \mathbb{1}$$ i.e. no matter what $t_1$ and $t_2$ are, $U$ should be unitary. In general, a unitary operator can be written in the form: $$U = e^{A}$$ where $A$ is an anti-hermitian operator: $$A^* = -A$$ Indeed, $U^*U = e^{A^* + A} = e^0 = \mathbb{1}$. An anti-hermitian operator is the imaginary unit times a hermitian operator: $A = i K$. Now: \begin{align*}|\psi\rangle(t) &= U|\psi\rangle(t_0)\\ &= e^{iK}|\psi\rangle(t_0)\\ \frac{\partial}{\partial t}|\psi\rangle(t) &= \frac{\partial}{\partial t}e^{iK}|\psi\rangle(t_0) = i\frac{dK}{dt} e^{iK}|\psi\rangle(t_0) = i\frac{dK}{dt} |\psi\rangle(t)\\ \implies -i\frac{\partial}{\partial t}|\psi\rangle(t) &= \frac{dK}{dt} |\psi\rangle(t)\end{align*} Identifying the hermitian operator $H \equiv \frac{1}{\hbar}\frac{dK}{dt}$ with the Hamiltonian, you get the Schrödinger equation.

In fancy words, the equations and mathematics of QM can be derived naturally if one admits that symmetry groups in nature (Poincaré group) should be treated through a unitary representation (S.E. equation from time translations, as we have seen above). Woit's addresses this beautifully in his book "Quantum Theory, Groups and Representations" (available for free)".

$\endgroup$
5
  • $\begingroup$ the evolution operator is $e^{iH\color{red}{t}}$. $\endgroup$ Commented Jan 19, 2020 at 0:38
  • $\begingroup$ There was an error, but that's not what I intended to write (evolution operator is $\mathrm{T} e^{i\int \mathrm{dt} H}$). $\endgroup$ Commented Jan 19, 2020 at 8:30
  • $\begingroup$ I don't really get the point of this. Isn't $\frac{dK}{dt}$ in this case is arbitrary? There's no justification that this is the hamiltonian or that it has any structure for that matter. (maybe it's a very complicated time dependent matrix, for example) Is this part answered? $\endgroup$ Commented Dec 17, 2020 at 21:27
  • $\begingroup$ Also something else I don't get with this derivation with unitary matrices, is that I don't understand how this can't apply to normal probability distributions - the "quantum interference" aspect isn't exactly covered, and it just seems like this could be applied to any probabilty theory. $\endgroup$ Commented Dec 17, 2020 at 21:28
  • $\begingroup$ $\frac{dK}{dt}$ is what we would call in physics the generator of time translations. This is the Hamiltonian by definition. Concerning the second question: the point is that we take for granted the Born rule (which can be deduced from experiment); namely that the probability distribution is the norm square of some complex vector $\rho = |\psi|^2$. $\endgroup$ Commented Dec 18, 2020 at 11:01
3
$\begingroup$

As others have said, a derivation means a derivation from postulates or axioms. Postulates can be motivated (as for example in Schrödinger's original treatment), but they cannot be derived. So, the question is really "what axioms are needed for a mathematical treatment of quantum mechanics; is Schrödinger's equation an axiom, or is it a theorem?"

Text books are usually more concerned with practical application than with mathematical structure, and generally treat Schrödinger's as a postulate, but it is in fact a theorem and can be derived from the Dirac–von Neumann axioms. An outline of the derivation is given at Derivation of the Schrödinger equation. I have given a detailed derivations in The Hilbert Space of Conditional Clauses and in A Construction of Full QED Using Finite Dimensional Hilbert Space

The key postulate is that probabilities are given by the Born rule (or expectations given by the inner product). One also requires that the fundamental physical behaviour of matter does not change. This enables one to show that the probability interpretation requires unitary time evolution satisfying the conditions of Stone's theorem, and the general form of the Schrödinger equation follows as a simple corollary.

The Schrödinger equation is also constrained by relativistic considerations, from which one finds the Dirac equation, and the form of the interaction density which must be composed of field operators obeying the Locality (or microcausality) condition, that (anti-)commutators vanish outside the light cone. Non-relativistic forms of the Schrödinger equation are seen as semi-classical approximations in which the photon field operator is replaced by its expectation.

$\endgroup$
0
3
$\begingroup$

I see people deriving the Schrödinger equation with the help of the wave function, T+U and partial differentials in three space coordinates and one time coordinate.

Indeed, there are various "derivations" of varying complexity like the process often encountered, using de Broglie ideas about matter waves to guess or arrive at the new equation for the wave. This is quite close to but not exactly the same way that Schroedinger used to discover the thing.

These are not derivations in the strict sense of the word (like in mathematics). Such derivation could assume only known things and derive the Schroedinger equation as a necessary consequence. Such derivation at the level of knowledge/skill of 3-5 year physics study programmes at a university does not exist, because there is not enough simpler things known that would necessarily imply the Schroedinger equation.

Historically, we have Schroedinger equation not because somebody derived it, but because Schroedinger discovered it, as he was thinking of ways to describe atomic systems in terms of a wave in agreement with the Einstein-Planck relation $\Delta E = h\nu$ and de Broglie ideas about matter particles having frequency and wavelength. He based his discovery, or got inspired by, Hamilton's and Jacobi's theories on classical mechanics and their similarity to geometrical optics.

There are some derivations on the level of quantum field theory or some other formal framework of theoretical physics, but as far as I have seen they lack in clarity and usefulness to a person on the level of undergraduate physics student - in these derivations one needs to assume things that are equally or more incomprehensible than the Schroedinger equation itself, such as quantized theory of field (infinitely many degrees of freedom), more difficult mathematics of distributions and operators, for more than 1 particle there are interactions that are hard to describe exactly, there are infinity cancellations or spurious removals needed to get results etc.

Maybe there is a deeper theory than quantum theory, that is easy to understand and the Schroedinger equation for atoms and molecules comes out of some specialized assumptions there, similarly to how scalar wave optics in small gradient medium can be explained as a simplified inaccurate model of full electromagnetic theory of light and matter (Maxwell's equations and some constitutive relations). But so far nobody has found such more general theory with explicative power (like the EM theory is in case of simple wave optics), and thus the best we can do when teaching Schroedinger's equation and related things is to use combination of:

  1. heuristic or formal "arrivals" at Schr. Eq. from other knowledge of atomic/molecule physics and quantum theory
  2. the origin story of Schr. Eq.
  3. examples of how to use the equations and examples of its successes when when compared to measured data (such as atomic spectra) or other results of experiments (such as diffraction patterns or detector counts)
$\endgroup$