3

I keep seeing talk here of how simplicity isn’t that important when deciding between theories but after looking at the different kinds of virtues behind theory choice, I fail to see how the rest can be any more important. Please let me know if I’m missing something here.

Here is the link at which I was looking at: https://www.oxfordbibliographies.com/abstract/document/obo-9780195396577/obo-9780195396577-0409.xml#:~:text=A%20theoretical%20virtue%20in%20science,consistency%2C%20coherence%2C%20and%20fertility

People often talk about comparing theories using explanatory power. But any theory can be adjusted to explain anything. For example, me waking up at 7 am can be explained through physical processes or an invisible goblin or god working in some hidden way (still using the same physical processes) to then cause me to wake up. So it seems that you can always get theories to have a tie on this.

What about coherence? Well coherence is about how well a theory accords with other well established beliefs. But then that just begs the question of the importance of those well established beliefs. A person seeing a life event as a sign from god coheres with his theistic beliefs. An atheistic person will not see a life event as a sign from god since it doesn’t cohere with his atheistic beliefs. Again, we are at a stand still.

Predictive power is another commonly cited virtue. A theory that makes predictions is more likely to be true than another theory that doesn’t. But again, one can still create an infinite number of mutually contradictory theories that explain the same events with the same predictions! For example, the predictions made by Newton’s laws are evidence for a world with Newton’s laws but no other forces at work. But they are also evidence for supernatural forces or beings such as god working using Newton’s laws. Or a devil causing events in the world using Newton’s laws. Or even some deeper level of reality that only looks as if it’s behaving using Newton’s laws but is really behaving using something else.

Now of course, one could argue that simplicity is just as subjective as these virtues but without it, can someone tell me how to choose between a theory that says god works with the world using Newton’s laws vs. Newton’s laws working by themselves?

If anything, this seems like the most objective criterion when compared to the rest. At the very least, when we imagine simulating this as a program, clearly simulating a god + the universe seems harder to do than the universe. And thus, the former would be more complex intuitively. I’m not sure if the same intuition holds for the rest of the virtues.

So why then is Occam’s razor seen as nothing but a very basic and subjective tool? Without it, there seems to be no way to choose between theories at all

12
  • 5
    No. A theory cannot be "adjusted" to explain anything. Invisible goblins without a detailed account of what they do, how they do it, when they act, etc., do not explain anything. And this is enough to prefer Newton's laws to goblins without any 'simplicity'. Coherence is not just about how 'well' a theory accords with other theories, it has to be internally coherent first. This is as basic a precondition as predictive accuracy. Scope and unification, not mentioned, address the class of phenomena covered without ad hoc "adjustments". Simplicity is only of use for excluding overt redundancy.
    – Conifold
    Commented May 30 at 6:25
  • If any theory can be adjusted to explain anything, then is that ADJUSTED theory still more simple than the other competing theories? Simplicity on its own isn't important, it needs to also explain. The simplest theory of physics which doesn't explain the stuff we see doesn't matter - and if you say "well it can be adjusted to explain whatever", then okay, how simple does it look after adjustment? If the adjusted theory explains stuff, and not the original theory, then we care about the adjusted one and the simplicity of it.
    – TKoL
    Commented May 30 at 6:31
  • Yes they can! @Conifold What you’re saying might apply to goblins by themselves but not a theory that says goblins work using laws for example. The detailed account is the laws themselves for example that goblins are working with. The “how” is the laws, the “when” is everywhere (assuming that the laws apply every time), etc etc. You can say that this is not adding anything to the law like explanation but that’s the point: you’d be ruling it out using simplicity. It is easy to create a tie for scope by the way. A theory adjusted to take care of a faulty piece of data explains everything
    – Lajar
    Commented May 30 at 12:18
  • 1
    P.S. one can always argue that a certain entity is not redundant to a theory as many theists do with respect to god. Most for example say that god is needed to explain the existence of scientific laws for example. So simplicity isn’t just based on obvious redundancies. As a first step, both proponents of a theory must agree on what is obviously redundant. But they rarely ever do and agreement or disagreement shouldn’t play a role here
    – Lajar
    Commented May 30 at 12:46
  • 1
    Occam's razor is a subjective tool because "simplicity" is subjective. How would you objectively measure simplicity? By counting the number of axioms required and then the number of proofs needed to reach the conclusion? That's not at all possible to do.
    – SkySpiral7
    Commented May 30 at 22:50

10 Answers 10

12

If two theories explain and predict exactly the same set of facts, then the less convoluted theory should be preferred. However, if two theories do not explain and predict exactly the same set of facts, it is unclear why simplicity should be the most important criterion. Instead, why not prefer the theory that can explain and predict more? Or the theory that strikes the "best balance" between explanatory power and complexity, according to some scoring function? Or the theory that strikes the "best balance" according to several other considerations?

To illustrate pictorially why simplicity alone doesn't necessarily determine the best theory, suppose you can formalize theories in such a way that you can compute two scores: a complexity score (lower is better) and an explanatory power score (higher is better). This allows you to plot theories on a Cartesian plane, as shown below:

enter image description here

Certain judgments can be easily made based on this plot, such as T1 being better than T4 because both have the same explanatory power, but T1 is simpler, or T2 being better than T3 because both have the same complexity, yet T2 explains more. However, judging other pair-wise comparisons is not as straightforward. For example, is T2 better than T1? According to simplicity, no, because T2 is more complex. However, according to explanatory power, yes, because T2 explains more. Historically, scientists have preferred theories that can account for more facts, such as general relativity over Newtonian mechanics, despite the former being more mathematically complex. This preference for greater explanatory power reflects a deeper scientific commitment to understanding the broadest range of phenomena, even at the cost of increased complexity.

Nonetheless, it's not clear why that should be considered the "optimal" preference function, especially since this only considers two criteria (complexity and explanatory power) without any rigorous formalization. Formalizing these concepts mathematically such that you can create plots like the one above introduces its own non-trivial complications, and there is no reason to limit theory choice to these two aspects alone. Why not include, for instance, predictive accuracy, coherence, computational feasibility, and other relevant factors, leading to multi-dimensional plots that better capture the complexity and trade-offs involved in theory choice? And what would be the optimal multi-variate function that would effectively balance these diverse criteria to guide theory choice in a comprehensive and nuanced manner? Personally, I don't know.

11
  • The other criterions can be used to not prefer theories that don’t give the same predictions or explain the same thing. That of course is not in dispute. The problem though is that there are still an infinite number of theories that technically make the same predictions and have the same explanatory power. Or if they don’t, they can be constructed post hoc or ad hoc. The only way to not prefer post hoc or ad hoc theories is through simplicity. Do you have a criterion by which to not prefer ad hoc/post hoc theories without simplicity?
    – Lajar
    Commented May 30 at 0:16
  • @Lajar The problem though is that there are still an infinite number of theories that technically make the same predictions and have the same explanatory power. - Sure, those would lie on a horizontal straight line on the Cartesian plot for a given fixed value of explanatory power. Do you have a criterion by which to not prefer ad hoc/post hoc theories without simplicity? - No. But, again, that's the particular case of competing theories lying on a horizontal straight line, which are trivial to solve (just pick the least complex one).
    – user66156
    Commented May 30 at 0:32
  • 1
    "However, if two theories do not explain and predict exactly the same set of facts," - if they explain a disjoint set of facts then they are incomparable, you can't say one is better or worse, and you could use both. If they explain mostly the same set of facts then they are both incomplete and you should extend them to explain the rest of the facts, and then judge them based on simplicity.
    – causative
    Commented May 30 at 1:39
  • 1
    +1 I like this answer, especially the idea of the n-dimensional graph of considerations. Commented May 30 at 8:52
  • 1
    I wish you had made the horizontal axis "simplicity" instead of "complexity", and talked about the Pareto frontier. It still exists in your plot, but it's harder to visualize.
    – benrg
    Commented May 30 at 20:14
4

The following are some criteria for a good (scientific) theory

  1. Explanatory power (more stuff that can be explained by a theory T, the better T is is. Includes retrodiction, explaining past events, and prediction, fortune-telling basically)
  2. Consilience/Coherence (Internal consistency as well as external consistency, with other well-established theories)
  3. Testability (Popperian falsifiability, linked to predictions)
  4. Simplicity (last but not the least, pluralitas non est ponenda sine necessitate, the famous novcula Occami)

I believe I've unwittingly ranked the criteria ... in order of importance.

There's a mathematical treatment of Occam's razor, you can find it on Wikipedia.

7
  • 1
    Why do you rank "testability" lower than "consilience"? Does the ranking really matter much (considering that all points are extremely vague)?
    – mudskipper
    Commented May 30 at 15:59
  • 1
    @mudskipper you might have a good theory that fits with other things but is hard to test. Like gravity waves for example. You can make progress and work on the testing part when possible. Computer programming is the same way: you have to have a program before you can test it.
    – Scott Rowe
    Commented May 30 at 23:51
  • 1
    +1 Your preference of importance, anyway. I like how your rollled verfication, confirmation, and falsification up under testability. Nice.
    – J D
    Commented May 31 at 22:06
  • @mudskipper, a good point. The book I got this info from was meant for novices, me being one. It was, as you've noticed, low on details/specifics. If you'd like to edit my answer, do so (testability is what differentiates science from pseudoscience). You must be in the know about string theory; is a good candidate for a Theory of Everything except it's not testable (word is that the energy levels for apropos experiments are currently beyond human capabilities). String theory is not considered pseudoscience though and perhaps you should make a note of that fact.
    – Hudjefa
    Commented Jun 2 at 0:02
  • @JD, Cogito, very non liquet, a theory makes a prediction. You test the theory by conducting an experiment to see if the prediction comes true (verification of the prediction). If the prediction comes true, this is confirmation of the theory (some say this view is now obsolete). If the prediction doesn't come true, this is falsification of the theory. Agree/Disagree/Both/Neither?
    – Hudjefa
    Commented Jun 2 at 0:08
4

How about you look at this less from a abstract and theoretical perspective and more from a perspective of practical application?

Like when is a theory any useful? When it helps you navigate your "life" (private, professional, social, philosophical, logical, ... whatever). That is, when it lets you derive rules for how you should act in a given scenario, that are correct or at least reliably incorrect, so that if you follow them, it would lead to a positive (defined by you) outcome (more often than not)

So the first arbiter is: Does it work? Or in other words: Do they perform better than random guessing. If they don't... well ... fuck that theory, simple as that.

So the "explanatory power" of "a god or invisible goblin did it", is practically 0. Seriously, making the claim that there are no patterns to a phenomena and that they happen at the will of a being that we can't see, feel or comprehend is signaling defeat to the very idea of gaining any knowledge in the first place.

So even theologies usually don't stop there, but try to create a psychology of god and infer motivations and rationals from past events. Or you know try to charm that god by building nice structures and doing the things that let to desirable outcomes as that might have been rewards or doing stuff that isn't likely to get you killed, because that might be have been punishments for bad behavior.

Now these assumptions don't have to be correct, but they do provide you with a simple, comprehensible narrative and if you follow them you're less likely to die than by random chance. So while far from perfect it's at least a useful theory. It's a heuristic that produces positive outcomes at a higher rate than random chance.

I could now go on listing other arbiters, but it's probably only fair to acknowledge that "does it work" is already a meta-category covering a whole range of different parameters of a theory. But I think it's nonetheless important to emphasize what the goal of that whole exercise is, because if you laser focus on one of the parameters and seek to maximize that, while diminishing it's usefulness, you've not actually done yourself any favor.

So for example "simplicity" or let's call it "applicability" is a major factor in the usefulness of a theory. If you can mathematically prove that you've got the perfect theory to determine what you should have for lunch and all it takes to know is to take a 20 year program to become the high priest of the religion of not dying of hunger and then build a super computer and let it crunch the numbers for 2000 years ... you might have reached a point where the perfect answer is no longer of any use to you.

Also it's comparatively easy to make theories complex. That is just wait till they fail, they likely do if they are simple, and then just add that failure as an exception to your rule. It won't take long before you're theory moves from something simplistic to something more like D&D rulebooks or you know like the legal code of most modern countries...

Which is obviously a problem, because rules that apply to everyone and that everyone should know and abide by are so plenty that you need specialized nerds called "lawyers", "attorneys" and "judges" to tell you what you're technically supposed to know already. And a whole different guild of nerds that make new rules, preferably without creating a unintended interplay with the old rules.

At which point the communication between the experts and the layperson becomes increasingly difficult because they develop their own and often not necessarily mutually applicable jargon. Leading to the point where most people don't actually know the law, but follow much simpler theories of law which they "feel" covers their everyday interactions better or at least sufficiently. Until they get into conflict with the law because their theories were too simplistic for the real world which might have been the reason why the laws that they didn't care about were included in the first place.

So being able to wield a theory, that is being able to apply it, being able to derive it's application in reasonable time, being able to comprehend it sufficiently and being able to amend and communicate it with other people are things that all rest upon it's simplicity, so yeah simplicity is a big deal.

BUT if you were to maximize for simplicity alone your theory would reduce to random guessing and/or treating every possible outcome with "shit happens". Like it hardly gets more simple than that, but it's also again pretty damn useless as a theory.

Another thing worth considering is whether you fracture or merge. So if you have a theory of everything, you can easily end up carrying tomes around when all you need is a list of a few bullet points. However if you have a bullet point list for every topic you might nonetheless end up carrying tomes around and worse if they all developed independently from each other than there's a good chance that there are tons of interfaces where the two theories don't even converge, but each of which coming up with a different advice what you should do, meaning you'd need to invent a new scientific domain for that interface creating even more interfaces with every other theory.

And so on. So TL;DR be careful not to confuse "most important criterion" with "all that matters", because maximizing one parameter can also make things worse rather than better.

14
  • I care about a theory’s truth not its usefulness
    – Lajar
    Commented May 30 at 12:29
  • 2
    @Lajar How do you assess it's truth outside of it's usefulness?
    – haxor789
    Commented May 30 at 12:32
  • 2
    @Lajar ??? How does simplicity tell you anything about the truth of a theory? Like for all intents and purposes it might be goblins at work, so your simplistic theory is wrong, right? It would nonetheless be useful to use the one without additional goblins, as the fewer randomness and the fewer overhead the more useful and applicable.
    – haxor789
    Commented May 30 at 13:09
  • 1
    For planetary motion, they thought, "circles are simple and perfect." Unfortunately they didn't match the data. So they added circles on the circles, and sometimes circles on those circles. The simplicity was getting hard to handle. Then that Kepler guy said, "it's the ellipse, stupid" and that worked much better. Mercury still had a problem, but Einstein sorted that out by taking a couple stitches in the fabric of space. Simple!
    – Scott Rowe
    Commented May 30 at 23:47
  • 2
    "How do you assess it's truth outside of it's usefulness?" Logical coherence is one measure. Predicative power is another. The former is related to the coherence theory of truth and the latter to the correspondence theory of truth.
    – J D
    Commented May 31 at 14:43
3

People often talk about comparing theories using explanatory power. But any theory can be adjusted to explain anything. For example, me waking up at 7 am can be explained through physical processes or an invisible goblin or god working in some hidden way to then cause me to wake up. So it seems that you can always get theories to have a tie on this.

The given link clearly states " Standard theoretical virtues include testability, empirical accuracy, simplicity, unification, consistency, coherence, and fertility."

So simplicity is there. I assume it is contested because in the history of advancement of human knowledge, simplicity itself as a tool has very rarely led to an advancement of knowledge. Instead one of the other virtues led to some advancement, which then later also seemed to simplify on the previous theory.

So scientists already always considered the simplest possible way to describe and explain observations, and did not invent additional theories along the way. Other humans came up with all kinds of superstitions and mythology, but their fantasy stories were of no importance to science in history.

So the role of Occam's razor in everyday reasoning about superstitions and other creative theories never contributed much to science. It is more helpful in situations where there are so few and rare observations available that science cannot be applied. And in such situations, there is typically no way to measure how often simplicity led to more true results or to less true results.

1
  • Simpler theories are at least easier to state, remember, test and apply. So they are more likely to be used.
    – Scott Rowe
    Commented May 31 at 0:05
3

In Bayesian inference we must ask: what is the simplest theory that exactly matches all the data? The simplicity of the theory corresponds to the theory's prior probability, and we eliminate any theories that fail to match all the data, so that the simplest theory left standing is the best, out of all theories that were intended to explain that particular set of data.

Complexity of a theory is the main criterion needed to evaluate the theory's prior probability, because of conjunction. A theory can be seen as a conjunction of different parts that all have to be true for the theory to come out right. If A and B are independent parts within the theory, then P(A and B) = P(A) P(B). If an independent part of the theory is something that a priori is a 50% chance, then adding that part to the theory reduces the probability of the overall theory by half. So, adding even a little extra complexity to a theory very rapidly makes it much, much less likely.

Each 50% chance is a "bit." The theory with the fewest independent bits (if it matches the data) is the best theory. If we represent the theory as a computer program that generates the data exactly, then the shortest such computer program has the fewest independent bits, and is therefore the best theory. If another program has only 10 more independent bits, then that reduces the prior probability by a factor of 2^10; it's more than 1000 times less likely.

10
  • Hypotheses can always be modified to ensure that their Bayesian likelihood is 1, especially through post hoc theories or ad hoc adjustments to an existing theory, correct? Thus, it seems as if the bulk of comparisons between theories thus rely upon prior probabilities. Which according to this answer is still some version of simplicity. Would you agree?
    – Lajar
    Commented May 30 at 0:18
  • @Lajar We shouldn't speak of modifying hypotheses, because a modified hypothesis is just a different hypothesis. In Solomonoff's theory of inductive inference, hypotheses always have a likelihood of 0 or 1, depending on if they match the data exactly or not, and as you say after we cross out the ones that don't match, we rely only on the prior probabilities.
    – causative
    Commented May 30 at 1:17
  • @Lajar no. What do you mean "modified to ensure a likelihood of 1"? Do you have an example of that?
    – TKoL
    Commented May 30 at 6:32
  • @TKoL If the data is 1237 and P(1237 | hypothesis) = 1, that's a likelihood of 1. Likelihood (according to one statistical definition) is the probability of the data, assuming the hypothesis is true. That's different from probability of the hypothesis, P(hypothesis | 1237), which would generally be significantly less than 1.
    – causative
    Commented May 30 at 10:41
  • @causative I understand what probability is, I'm asking what does it mean to modify a hypothesis to have a probability of 1? My hypothesis is the world is flat - how do I modify this to have a probability of 1?
    – TKoL
    Commented May 30 at 10:50
3

Validity = #1.

Explicatory power is high up there.

Plausability is handy.

Coherence with other info and theory is good.

Simplicity is compelling. Not at the expense of the other attributes.

2

Five minutes thought should convince you of two things, firstly that 'simplicity' is a hopelessly vague term in this context, and secondly that simplicity is only one of a number of factors that might determine which is the best of a candidate set of theories. To use a parallel, you might just as well ask whether taste, say, is the most important criterion for choosing the best food, or whether comfort is the most important factor in choosing a car. You are making a mistake in assuming that a single factor is of overriding importance, and compounding that mistake by referencing a factor that is inherently ambiguous.

2

Is simplicity the most important criterion when choosing between theories?

Absolutely not. A simpler scientific theory that doesn't accord with empirical data is clearly not better. What makes a theory superior is a complex judgement that depends on context and doesn't reduce to a single factor. Since theories are essentially efforts to explain such contexts and criteria include:

1

When it comes to scientific theories, simplicity is a red herring

Predictive power is the only thing that matters. Everything else is a mnemonic or metaphor.

Within the domain of predictive power, the criteria are:

  1. Maximize Reliability: A theory that produces only true predictions is preferred over a theory which produces a mix of true and false predictions.
  2. Maximize Testable Scope: A theory which produces a strict superset of another theory's predictions is preferred, as long as some of its additional predictions are testable.
  3. Minimize Untestable Scope: A theory which produces a strict superset of another theory's predictions should be set aside if none of those additional predictions are testable - to be re-evaluated if any of its additional predictions become testable. (This is because for every set of true predictions, there are an infinite number of possible supersets which also make untestable predictions.)
  4. All else being equal, teach whatever's easiest to teach: If two theories produce exactly the same set of predictions, then they are the same theory and we prefer the simpler explanation of that theory.

Simplicity is only really part of the fourth criterion, which only comes into play if all the others are inconclusive. Many arguments from simplicity, in my opinion, would be better expressed as arguments from untestable scope.

Example 1 - Quantum Physics

Let's consider 4 models:

Evaluating against our four criteria, we get:

  1. Because Hidden Variables makes false predictions, it is discarded.
  2. All remaining interpretations have the same testable scope for now, but if a future development makes many worlds testable, either it or the Copenhagen interpretation will be discarded.
  3. The Copenhagen interpretation makes fewer untestable claims, so it is preferred over many worlds (for now).
  4. The Copenhagen interpretation is difficult to teach, so a modified version which uses another interpretation as a metaphor is what's most commonly taught when explaining quantum physics.

Example 2 - Cosmology/Religion

Most of the time, arguments about Occom's Razor in science are used when discussing cosmology and creation. Let's consider 3 models:

  • The Big Bang - "The universe as we know it began with a massive explosion of energy, which we don't know the causes of."
  • Christian God Cosmological Creationism - "Like the above, but the God described in the Bible exists and caused the Big Bang. He has left the universe to do its own thing since then and has not intervened."
  • Christian God Young-Earth Creationism - "God created the world 6000 years ago. He has intervened several times since then."

Evaluating against our four criteria (from the perspective of an agnostic who has no prior conviction about whether or not the Christian God exists), we get:

  1. Young Earth Creationism makes predictions which are inconsistent with our measurements (e.g. carbon dating of fossil records), so it is discarded.
  2. Both remaining theories again have the same testable scope during our lifetimes, but the second theory makes a prediction about what we will experience after death. If, after we die, we remain capable of observations and scientific thought, and observe things which are consistent with the Christian God, we would then be able to discard the Big Bang theory as a subset of CGCC.
  3. Before we die, CGCC makes untestable predictions, so we set it aside (along with an infinite number of other theories about possible Gods who might have caused the Big Bang).

Only one theory remains, so there is no need to proceed to step 4.

0

Well, does one have an example of two different theories explaining the same phenomena with the same accuracy? I doubt it. This concept of simplicity is just an idealization of how things work. At least in science I see theories getting more and more complicated over time.

1
  • "The world around me is real" and "I am a brain in a jar, provided with inputs indistinguishable from those I'd receive if the world was real". If the simpler answer is true, there is no way to prove it as the more complicated answer provides identical observable results. Commented May 30 at 23:10

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .