32
$\begingroup$

I'm planning to teach a winter course on a varying number of topics, one of which is going to be compilers. Now, I came across this problem while thinking of assignments to give throughout the quarter, but it has me stumped so I might use it as an example instead.

public class DeadCode {
  public static void main(String[] args) {
     return;
     System.out.println("This line won't print.");
  }
}

In the program above, it's obvious that the print statement will never execute because of the return. Compilers sometimes give warnings or errors about dead code. For example, the above code will not compile in Java. The javac compiler, however, will not detect all instances of dead code in every program. How would I prove that no compiler can do so?

$\endgroup$
11
  • 29
    $\begingroup$ What is your background and what is the context you'll be teaching in? To be blunt, I'm mildly worried that you have to ask this, seeing as you are going to teach. But good call asking here! $\endgroup$
    – Raphael
    Commented Nov 11, 2015 at 9:05
  • 5
    $\begingroup$ Same question at Stack Overflow $\endgroup$
    – Moyli
    Commented Nov 11, 2015 at 10:20
  • 9
    $\begingroup$ @MichaelKjörling Dead code detection is impossible even without those considerations. $\endgroup$ Commented Nov 11, 2015 at 20:00
  • 2
    $\begingroup$ BigInteger i = 0; while(isCollatzConjectureTrueFor(i)) i++; printf("Hello world\n"); $\endgroup$ Commented Nov 12, 2015 at 8:41
  • 2
    $\begingroup$ @immibis The question asks for a proof that dead code detection is impossible. You have given an example where correct dead code detection requires solving an open problem in mathematics. That does not prove that dead code detection is impossible. $\endgroup$ Commented Nov 12, 2015 at 16:20

5 Answers 5

57
$\begingroup$

It all comes from undecidability of the halting problem. Suppose we have a "perfect" dead code function, some Turing Machine M, and some input string x, and a procedure that looks something like this:

Run M on input x;
print "Finished running input";

If M runs forever, then we delete the print statement, since we will never reach it. If M doesn't run forever, then we need to keep the print statement. Thus, if we have a dead-code remover, it also lets us solve the Halting Problem, so we know there can be no such dead-code remover.

The way we get around this is by "conservative approximation." So, in my Turing Machine example above, we can assume that running M on x might finish, so we play it safe and don't remove the print statement. In your example, we know that no matter which functions do or don't halt, that there's no way we will reach that print statement.

Usually, this is done by constructing a "control-flow graph". We make simplifying assumptions, such as "the end of a while loop is connected to the beginning and the statement after", even if it runs forever or runs only once and doesn't visit both. Similarly, we assume that an if-statement can reach all of its branches, even if in reality some are never used. These kinds of simplifications allow us to remove "obviously dead code" like the example you give, while remaining decidable.

To clarify a few confusions from the comments:

  1. Nitpick: for fixed M, this is always decidable. M has to be the input

    As Raphael says, in my example, we consider the Turing Machine as an input. The idea is that, if we had a perfect DCE algorithm, we would be able to construct the code snippet I give for any Turing Machine, and having a DCE would solve the halting problem.

  2. not convinced. return as a blunt statement in a no-branch straight forward execution is not hard to decide. (and my compiler tells me it is capable of figuring this out)

    For the issue njzk2 raises: you are absolutely right, in this case you can determine that there is no way a statement after the return can be reached. This is because it's simple enough that we can describe its unreachability using control-flow graph constraints (i.e. there are no outgoing edges out of a return statement). But there is no perfect dead code eliminator, which eliminates all unused code.

  3. I don't take input-dependent proof for a proof. If there exists such kind of user input that can allow the code to be finite, it's correct for the compiler to assume that following branch is not dead. I can't see what are all these upvotes for, it's both obvious (eg. endless stdin) and wrong.

    For TomášZato: it's not really an input dependent proof. Rather, interpret it as a "forall". It works as follows: assume we have a perfect DCE algorithm. If you give me an arbitrary Turing Machine M and input x, I can use my DCE algorithm to determine whether M halts, by constructing the code snippet above and seeing if the print-statement is removed. This technique, of leaving a parameter arbitrary to prove a forall-statement, is common in math and logic.

    I don't fully understand TomášZato's point about code being finite. Surely the code is finite, but a perfect DCE algorithm must apply to all code, which is an infinte set. Likewise, while the code-itself is finite, the potential sets of input are infinte, as is the potential running-time of the code.

    As for considering the final branch not-dead: it is safe in terms of the "conservative approximation" I talk about, but it's not enough to detect all instances of dead code as the OP asks for.

Consider code like this:

while (true)
  print "Hello"
print "goodbye"

Clearly we can remove print "goodbye" without changing the behavior of the program. Thus, it is dead code. But if there's a different function call instead of (true) in the while condition, then we don't know if we can remove it or not, leading to the undecidability.

Note that I am not coming up with this on my own. It is a well known result in the theory of compilers. It's discussed in The Tiger Book. (You might be able to see where they talk about in in Google books.

$\endgroup$
11
  • 1
    $\begingroup$ @njzk2: We're trying to show it's impossible to build a dead code eliminator that eliminates all dead code, not that it's impossible to build a dead code eliminator that eliminates some dead code. The print-after-return example can be eliminated easily using control-flow graph techniques, but not all dead code can be eliminated this way. $\endgroup$ Commented Nov 11, 2015 at 19:03
  • 4
    $\begingroup$ This answer references comments. As I read the answer, I need to jump down into the comments, then return to the answer. This is confusing (doubly so when you consider that comments are fragile and may be lost). A self-contained answer would be far easier to read. $\endgroup$
    – TRiG
    Commented Nov 12, 2015 at 12:02
  • 1
    $\begingroup$ @TomášZato - consider the program that increments a variable $n$ and checks whether or not $n$ is an odd perfect number, terminating only when it finds such a number. Clearly this program does not depend on any external input. Are you asserting that it can easily be determined whether or not this program terminates? $\endgroup$ Commented Nov 12, 2015 at 14:51
  • 3
    $\begingroup$ @TomášZato You are mistaken in your understanding of the halting problem. Given a finite Turing Machine $M$, and finite input $x$, it's impossible to determine whether $M$ infinitely loops while running on $x$. I haven't proven this rigorously because it has been proved over and over again, and is a fundamental principle of computer science. There's a nice sketch of the proof on Wikipedia $\endgroup$ Commented Nov 12, 2015 at 18:19
  • 1
    $\begingroup$ jmite, please incorporate valid comments into the answer so that the answer stands on its own. Then flag all comments that are obsolete as such so we can clean up. Thanks! $\endgroup$
    – Raphael
    Commented Nov 12, 2015 at 20:26
14
$\begingroup$

This is a twist on jmite's answer that circumvents the potential confusion about non-termination. I'll give a program that always halts itself, may have dead code but we can not (always) algorithmically decide if it has.

Consider the following class of inputs for the dead-code identifier:

simulateMx(n) {
  simulate TM M on input x for n steps
  if M did halt
    return 0
  else
    return 1
}

Since M and x are fixed, simulateMs has dead code with return 0 if and only if M does not halt on x.

This immediately gives us a reduction from the halting problem to dead-code checking: given TM $M$ as halting-problem instance, create above program with x the code of $M$ -- it has dead code if and only if $M$ does not halt on its own code.

Hence, dead-code checking is not computable.

In case you are unfamiliar with reduction as a proof technique in this context, I recommend our reference material.

$\endgroup$
5
$\begingroup$

A simple way to demonstrate this kind of property without getting bogged into details is to use the following lemma:

Lemma: For any compiler C for a Turing-complete language, there exists a function undecidable_but_true() which takes no arguments and returns the boolean true, such that C cannot predict whether undecidable_but_true() returns true or false.

Note that the function depends on the compiler. Given a function undecidable_but_true1(), a compiler can always be augmented with the knowledge of whether this function returns true or false; but there is always some other function undecidable_but_true2() that won't be covered.

Proof: by Rice's theorem, the property “this function returns true” is undecidable. Therefore any static analysis algorithm is unable to decide this property for all possible functions.

Corollary: Given a compiler C, the following program contains dead code which cannot be detected:

if (!undecidable_but_true()) {
    do_stuff();
}

A note about Java: the Java language mandates that compilers reject certain programs that contain unreachable code, while sensibly mandating that code is provided at all reachable points (e.g. control flow in a non-void function must end with a return statement). The language specifies exactly how the unreachable code analysis is performed; if it didn't then it would be impossible to write portable programs. Given a program of the form

some_method () {
    <code whose continuation is unreachable>
    // is throw InternalError() needed here?
}

it is necessary to specify in which cases the unreachable code must be followed by some other code and in which cases it must not be followed by any code. An example of a Java program that contains code that is unreachable, but not in a way that Java compilers are allowed to notice, comes up in Java 101:

String day_of_week(int n) {
    switch (n % 7) {
    case 0: return "Sunday";
    case 1: case -6: return "Monday";
    …
    case 6: case -1: return "Saturday";
    }
    // return or throw is required here, even though this point is unreachable
}
$\endgroup$
5
  • $\begingroup$ Note that some compilers for some languages may be able to detect that the end of day_of_week is unreachable. $\endgroup$ Commented Nov 12, 2015 at 8:50
  • $\begingroup$ @immibis Yes, for example CS101 students can do it in my experience (though admittedly CS101 students aren't a sound static analyzer, they usually forget about the negative cases). That's part of my point: it's an example of a program with unreachable code that a Java compiler will not detect (at least, may warn about, but may not reject). $\endgroup$ Commented Nov 12, 2015 at 9:04
  • 1
    $\begingroup$ I'm afraid the phrasing of the Lemma is misleading at best, with a tint of wrongness to it. Undecidability only makes sense if you phrase it terms of (infinite) sets of instances. (The compiler does produce an answer for every function, and we know that it can not be always correct, but saying that there's a single undecidable instance is off.) Your paragraph between the Lemma and the Proof (which does not quite match the Lemma as stated) tries to fix this, but I think it would be better to formulate a clearly correct lemma. $\endgroup$
    – Raphael
    Commented Nov 12, 2015 at 19:54
  • $\begingroup$ @Raphael Uh? No, the compiler need not produce an answer to the question “is this function constant?”. It doesn't need to distinguish “I don't know” from “no” to produce working code, but that's not relevant here since we're only interested in the static analysis part of the compiler, not in the code translation part. I don't understand what you find misleading or incorrect about the statement of the lemma — unless your point is that I should write “static analyzer” instead of “compiler”? $\endgroup$ Commented Nov 12, 2015 at 20:49
  • $\begingroup$ The statement sounds like "undecidability means that there is an instance that can not be solved", which is wrong. (I know you don't mean to say that, but that's how it can read to the unwary/novices, imho.) $\endgroup$
    – Raphael
    Commented Nov 12, 2015 at 23:24
3
$\begingroup$

jmite's answer applies to whether the program will ever exit a calculation--just because it's infinite I wouldn't call the code after it dead.

However, there's another approach: A problem for which there is an answer but it's unknown:

public void Demo()
{
  if (Chess.Evaluate(new Chessboard(), int.MaxValue) != 0)
    MessageBox.Show("Chess is unfair!");
  else
    MessageBox.Show("Chess is fair!");
}

public class chess
{
  public Int64 Evaluate(Chessboard Board, int SearchDepth)
  {
  ...
  }
}

This routine without a doubt does contain dead code--the function will return an answer that executes one path but not the other. Good luck finding it, though! My memory is no theoretical computer can solve this within the lifespan of the universe.

In more detail:

The Evaluate() function computes which side wins a chess games if both sides play perfectly (with maximum search depth).

Chess evaluators normally look ahead at every possible move some specified depth and then attempt to score the board at that point (sometimes expanding certain branches farther as looking halfway through an exchange or the like can produce a very skewed perception.) Since the actual maximum depth is 17695 half-moves the search is exhaustive, it will traverse every possible chess game. Since all the games end there's no issue of trying to decide how good a position each board is (and thus no reason to look at the board evaluation logic--it will never be called), the result is either a win, a loss or a draw. If the result is a draw the game is fair, if the result is not a draw it's an unfair game. To expand it a bit we get:

public Int64 Evaluate(Chessboard Board, int SearchDepth)
{
  foreach (ChessMove Move in Board.GetPossibleMoves())
    {
      Chessboard NewBoard = Board.MakeMove(Move);
      if (NewBoard.Checkmate()) return int.MaxValue;
      if (NewBoard.Draw()) return 0;
      if (SearchDepth == 0) return NewBoard.Score();
      return -Evaluate(NewBoard, SearchDepth - 1);
    }
}

Note, also, that it will be virtually impossible for the compiler to realize that Chessboard.Score() is dead code. A knowledge of the rules of chess allows us humans to figure this out but to figure this out you have to know that MakeMove can never increase the piece count and that Chessboard.Draw() will return true if the piece count remains static for too long.

Note that the search depth is in half-moves, not whole moves. This is normal for this sort of AI routine as it's an O(x^n) routine--adding one more search ply has a major effect on how long it takes to run.

$\endgroup$
15
  • 8
    $\begingroup$ You assume that a checking algorithm would have to perform the calculation. A common fallacy! No, you don't get to assume anything about how a checker would work, otherwise you can not refute its existence. $\endgroup$
    – Raphael
    Commented Nov 11, 2015 at 9:06
  • 6
    $\begingroup$ The question requests a proof that it is impossible to detect dead code. Your post contains an example of a case where you suspect it would be difficult to detect dead code. That isn't an answer to the question at hand. $\endgroup$ Commented Nov 11, 2015 at 10:19
  • 2
    $\begingroup$ @LorenPechtel I don't know, but that's not a proof. See also here; a cleaner example of your misconception. $\endgroup$
    – Raphael
    Commented Nov 11, 2015 at 23:17
  • 3
    $\begingroup$ If it helps, consider that there's nothing theoretically stopping someone from running their compiler for more than the lifetime of the universe; the only limitation is practicality. A decidable problem is a decidable problem, even if it's in the complexity class NONELEMENTARY. $\endgroup$
    – Pseudonym
    Commented Nov 12, 2015 at 0:09
  • 4
    $\begingroup$ In other words, this answer is at best a heuristic intended to show why it's probably not easy to build a compiler that detects all dead code -- but it's not a proof of impossibility. This kind of example might be useful as a way to build intuition for students, but it is not a proof. By presenting itself as a proof, it does a disservice. The answer should be edited to state that it is an intuition-building example but not a proof of impossibility. $\endgroup$
    – D.W.
    Commented Nov 12, 2015 at 20:36
-3
$\begingroup$

I think in a computing course, the notion of dead code is interesting in the context of understanding the difference between compile time and run time!

A compiler can determine when you've got code that can in no compile-time scenario ever be traversed, but it cannot do so for runtime. a simple while-loop with user input for the loop-break test shows that.

If a compiler could actually determine runtime dead code (i.e. discern Turing complete) then there's an argument that the code never needs be run, because the job's already done!

If nothing else, the existence of code that passes compile-time dead code checks illustrates the need for pragmatic bounds-checking on inputs and general coding hygiene (in the real world of real projects.)

$\endgroup$
4
  • 1
    $\begingroup$ The question asks for a proof that it is impossible to detect dead code. You have not answered that question. $\endgroup$ Commented Nov 12, 2015 at 22:25
  • $\begingroup$ Also, your assertion that "A compiler can determine when you've got code that can in no compile-time scenario ever be traversed" is incorrect and directly contradicts what the question asks you to prove. $\endgroup$ Commented Nov 12, 2015 at 23:41
  • $\begingroup$ @David Richerby, I think you may be misreading me. I'm not suggesting that compile-time checking can find ALL dead code, quite definitely not. I'm suggesting that there is a subset of the set of all dead code that is discernable at compile time. If I write: if(true==false){ print("something");}, that print statement will be discernable at compile time to be dead code. Do you disagree that this is a counterexample to your assertion? $\endgroup$
    – dwoz
    Commented Nov 13, 2015 at 15:07
  • $\begingroup$ Sure, you can determine some dead code. But if you're going to say "determine when [you have dead code]" with no qualifications then that, to me, means find all the dead code, not just some of it. $\endgroup$ Commented Nov 13, 2015 at 20:28

Not the answer you're looking for? Browse other questions tagged or ask your own question.