13

I have compiled GCC from source but I can't seem to fully understand the utility of gcc compiling itself three times.

What benefit does this serve ?

This answer says:

  • Build new version of GCC with existing C compiler
  • re-build new version of GCC with the one you just built
  • (optional) repeat step 2 for verification purposes.

Now my question is that once the first step is complete and the compiler is built why waste time rebuilding it ?

Is it just for verification ? If so, it seems pretty wasteful.

Things get more complicated over here,

The build for this is more complex than for prior packages, because you’re sending more information into the configure script and the make targets aren’t standard.

I mean the whole compiler is written in C right, so why not just do everything in one pass ?

What is the use of the 3-phase bootstrap ?

Thanks in advance.

1
  • 1
    The meaning of "it" in the phrase "why waste time rebuilding it?" is different for stage 1 than for stages 2 and 3. See the new and more in depth answer below. The last two stages are far from a waste of time. Commented May 25, 2020 at 12:52

2 Answers 2

10
  • Stage 2. and 3. are a good test for the compiler itself: If it can compile itself (and usually also some libraries like libgcc and libstdc++-v3) then it can chew non-trivial projects.

  • In stage 2. and 3., you can generate the compiler with different options, for example without optimization (-O0) or with optimization turned on (-O2). As the output / side effects of a program should not depend on the optimization level used, either version of the compiler must produce the same binary for the same source file, even though the two compilers are binary very different. This is yet another (run-time test) for the compiler.

If you prefer non-bootstrap for some reason, configure --disable-bootstrap.

3
  • 1
    So thats all it is, a test ? I can disable the whole process and to it in 1 pass ?
    – ng.newbie
    Commented Mar 6, 2020 at 16:18
  • 3
    Usually a new version of GCC produces better (i.e. faster) code. So the idea of the second build is to produce a compiler which runs faster, taking advantage of the better optimisations which are implemented in it itself. If you were confident that your existing compiler already compiles as well as possible, why are you attempting to upgrade? Conversely, if you are upgrading, why would you not want to take advantage of the upgrade? The (optional) third compile, as the quoted answer says, is to verify that everything works as expected.
    – rici
    Commented Mar 6, 2020 at 19:01
  • 2
    @rici The main reasons one usually builds a compiler from source are for debugging, development, or making use of highly experimental language features that may require custom patches. Making the compiler itself marginally faster is irrelevant for these.
    – Benno
    Commented Feb 11, 2022 at 9:45
4

Considering the question from an information theory perspective, the first stage in a three stage compilation of a compiler does not produce a compiler. It produces a hypothesis that requires experimental verification. The sign of a good compiler distribution package is that it will produce, out of the box and without further work for the system administrator or compiler developer, a working compiler of the distribution's version and with the desired features of that version of that brand of compiler.

Making that happen is not simple. Consider the variables in the target environment.

  • Target operating system brand
  • Operating system version
  • Operating system settings
  • Shell environment variables
  • Availability of headers for inclusion
  • Availability of libraries for linking
  • Settings passed to the build process
  • Architecture of the target processing unit
  • Number of processing units
  • Bus architecture
  • Other characteristics of the execution model
  • Mistakes the developers of the compiler might make
  • Mistakes the person building the compiler might make

In the GNU compiler tool set, and in many tarball distributions, the program "configure" attempts to produce a build configuration that adapts to as many of the permutations of these as is reasonably possible. The completion without error or warning from configure is not a guarantee that the compiler will function. Furthermore, and more importantly for this question, the completion of the build is no guarantee either.

The newly built compiler may function for HelloWorld.c but not for a collection of a thousand source files in a multi-project, multi-repository collection of software called, "Intelligent Interplanetary Control and Acquisition System."

Stage two and three are reasonable attempts at checking at least some of the compiler capabilities, since the compiler source itself is handy and demands quite a bit out of the hypothetically working compiler just built.

It is important to understand that the result of stage one and the result of stage two will not match. Their executables and other built artifacts are results from two different compilers. The stage one result is compiled with whatever the build system found in one of the directories listed in the "PATH" variable to compile C and C++ source code. The stage two result is compiled with the hypothetically working new compiler. The interesting probabilistic consideration is this:

If the result of using stage one's result to compile the compiler again equals exactly the result of using stage two's result to compile the compiler a third time, then both are likely correct for at least the features that the compiler's source code requires.

That last sentence may need to be reread a dozen times. Its actually a simple idea, but the redundancy of the verb compile and noun compiler can tie a knot that takes a few minutes to untie and be able to retie. The source, the target, and the action executed have the same linguistic root, not just once but three times.

The build instructions for the compiler, as of May 25th, 2020, states the converse, which is easier to understand but merely anecdotal, not getting at the cruz of the reason three stages are important.

If the comparison of stage2 and stage3 fails, this normally indicates that the stage2 compiler has compiled GCC incorrectly, and is therefore a potentially serious bug which you should investigate and report.

If we consider C/C++ development from a reliability assessment, test-first, eXtreme Programming, 6-Sigma, or Total Quality Management perspective, what component in a C/C++ development environment has to be more reliable than the compiler? Not many. And even the three stage bootstrapping of a compiler that the GNU compiler package has been using since early days is a reasonable but not an exhaustive test. That's why there are additional tests in the package.

From a continuous integration point of view, the entire body of software under development by those that are about to use the new compiler should be tested before and after a new compiler is compiled and deployed. That's the most convenient way to ensure new compiler didn't break the build.

Between the three reliability check points, most people are satisfied.

  1. Ensuring the compiler compiles itself consistently
  2. Other tests the compiler developers have put into their distribution
  3. The developer or system administrators source code domain is not broken by the upgrade

On a mathematics side note, it is actually impossible to exhaustively test a compiler with the silicon and carbon available on planet earth. The bounds of recursion in C++ language abstractions (among other things) are infinite, so the silicon or time required places testing every permutation of source code cannot realistically exist. On the carbon side, no group of people can free up the requisite time to study the source sufficiently to guaranteed that some finite limit is not imposed in some way by the compiler source."

The three levels of checks, only one of which is the three stage bootstrap process, will likely suffice for most of us.

A further benefit of the three stage compile is that the new compiler is compiled with the new compiler which is presumably better either in terms of speed or resource consumption and possibly both.

Not the answer you're looking for? Browse other questions tagged or ask your own question.